Basics
OLAP (online analytical processing)
Technique for analyzing business data that uses dimensional models often deployed as cubes, which are like multidimensional pivot tables in spreadsheets. Often answers the question 'What happened and why'. OLAP tools can perform trend analysis and enable drilling down into data. They enable multidimensional analysis such as analyzing by time, product, and geography.
Data preparation
The core set of processes for data integration. These processes gather data from diverse source systems, transform it according to business and technical rules, and stage it for later steps in its life cycle when it becomes information used by information consumers.
Analytics
The examination of information to uncover insights that give a business person the knowledge to make informed decisions.
Raw data
collection of numbers, characters; data processing commonly occurs by stages, and the "processed data" from one stage may be considered the "raw data" of the next.
Unstructured data
data that is free form or unorganized. Emails, tweets, etc.
Purpose of analytical tools
enable people to query and analyze information using data visualization to communicate findings in an easy-to-understand way.
Reporting (core BI style)
Collecting data from various sources and presenting it to business people in an understandable way so they can analyze it. Reports were initially static with predefined formats but have become interactive and customizable
Data franchising
Packages data into a BI data store so business people can understand and use it. Although it creates data that is redundant with what's in the data warehouse, it is a controlled redundancy. The data stores may be dependent data marts or cubes. Data franchising takes place after data preparation.
Scorecards
Performance management tools that help managers track performance against strategic goals. (type of dashboard)
Data visualization
Presenting data in a way, such as with graphs and charts, helps business people glean insights they might not otherwise discern from tabular data. Dashboards and self-service BI use data visualisation , but it is only as effective as the quality of the data it draws upon.
Where does BI get its raw data from?
Systems of Record - transaction systems Systems of Engagement - people, not processes Systems of Automation - Internet of Things etc Systems of Insight - Data lakes
Text or textual analysis
The use of data mining for analysis of unstructured textual data such as emails. Text mining tools help find instances of fraud in thousands of emails or mentions of a company's name in social media.
5 C's for a BI programs
Clean Consistent Conformed Current Comprehensive
Operational BI
Queries and reporting are performed on operational systems themselves, as opposed to the data warehouse. Most enterprises need a mix of operational BI and analytical BI from the DW.
SQL
is Structured Query Language - standard computer language for relational database management and data manipulation. SQL is used to query, insert, update and modify data.
Report (or analytical) governance
BI deliverables need solid report governance in order to provide consistent information with which the business can make decisions. Report governance includes managing not only reports but also dashboards, scorecards, self-service BI, ad hoc query, OLAP analysis, predictive analytics, data visualization, data mining, and spreadsheets along with the data used.
Predictive analytics
An advanced form of analytics that uses business information to find patterns and predict future outcomes and trends. Determining credit scores by looking at a customer's credit history and other data is a typical use for predictive analytics.
BI application
Any BI project deliverable(s) that the BI team develops for business people to use in their analysis. e.g. dashboard, OLAP cube, predictive model etc
Dashboards
BI tool that displays numeric and graphical informations on a single display, making it easy for a business person to get information from different sources and customize the appearance.
Structured data
Data that can be organized in a pre-defined record or file and may be stored in a database or spreadsheet. E.g. enterprise's sales, employee and financial data
Self-service BI
Intuitive tools that allow BI consumers to obtain the information they need without the help of the IT group.
In-memory analytics
Leveraging advances in memory to provide faster and deeper analytics by querying a system's random-access memory (RAM) instead of on disks. In-memory analytics architectural options include in-memory analytics in the BI tools, as part of the database or on the BI appliance platform.
3 types of processes
Management process, e.g. strategic planning Operational processes, e.g. taking orders, opening a bank account Supporting processes, e.g. recruitment, call centre
3 main databases
MySQL, Oracle, SQL Server
What does BI deliver to the business?
Reports Dashboards Visualisation Analytics, sometimes even Predictive Analytics
Data virtualization
Retrieving and manipulating data without requiring details of how it is formatted or where it is located. It enables enterprises to expand the data used in their analysis without requiring that it be physically integrated. They do not have to get IT involved (via business requirements, data modeling, and ETL and BI design) every time data needs to be added, allowing them to focus more on data discovery. Also called data federation and formerly called enterprise information integration (EII).
ETL (extract, transform, load)
The process in which data is taken from the source system, configured, and stored in a data warehouse or database. ETL tools automated data integration tasks.
Data mining
This process analyzes large quantities of data to find patterns such as groups of records, unusual records, and dependencies. Data mining helps businesses sift through data to find patterns and relationships they do not yet know, such as "what is the likelihood that a customer who buys our hammer will also buy our nails?"
5 V's of Big Data
Variety Volume Veracity Velocity Value
BI appliance
bundled hardware and software aimed at making it easier and more cost-effective for enterprises to purchase, use and maintain their BI solution. NB: there is a wide variety of architectures used in appliances, so a formal evaluation and proof of concept (POC) are highly recommended to ensure a match with the situation.
Attribute
characteristic of an entity, identifies specific entity, relates an entity to another entity, describes the entity.
Software
the programs and other operating information used by a computer.
MDM (master data management)
the set of processes used to create and maintain a consistent view, also referred to as a master list, of key enterprise reference data. This data includes such entities as customers, prospects, suppliers, employees, products, services, assets, and accounts. It also includes the groupings and hierarchies associated with these entities.
3 Types of decisions
1) Strategic decisions 2) Tactical decisions 3) Operational decisions
Two OLAP camps
MOLAP (multidimensional) and ROLAP (relational). HOLAP (hybrid) combines them
Five C's of data
clean, consistent, conformed, current, comprehensive
2 key benefits of BI appliance
1) Scalability 2) flexibility
4 Analytical types
1) descriptive (what happened?) 2) diagnostic (why it happened?) 3) predictive (what is likely to happen?) 4) prescriptive (what actions should be taken?) NB: descriptive = core/common; the rest = advanced!
Dimensional modeling
A generally accepted practice in the data warehouse industry to structure data intended for user access, analysis and reporting in dimensional data models.
Ad hoc query
A non-standard inquiry. An ad hoc query is created to obtain information as the need arises. Contrast with a query that is predefined and routinely processed. Tools for ad hoc querying can help you manipulate data for analysis and report creation. Most business people, however, do not really need ad hoc querying; they do fine with interactive reporting and data discovery.
Data governance
A process that enforces consistent definitions, rules, business metrics, policies, and procedures for how an enterprise treats its data. It can encompass many areas including data creation, movement, transformation, integration, definitions all the way to consumption. A data governance program helps the organization treat its data as a corporate asset and maximize its value, but the process of governance is challenged by data that is unstructured and from the cloud, as well as by Big Data.
Data mart
A subset of data warehouse that's usually oriented to a business group or process rather than enterprise-wide views. They have value as part of the overall enterprise data architecture, but can cause problems when they sprout uncontrolled as data silos with their own data definitions, creating data shadow system
ODS (operational data store)
A type of database sometimes used in a BI data architecture. Unlike a data warehouse, an ODS may serve both analytical and operational functions
Data quality
Achieved when data embodies the 'five C's': clean, consistent, conformed, current, comprehensive
EAI
Enterprise application integration / SOA: tools and methods for consolidating and integrating the applications that exist in an enterprise. NB: goal is usually to protect the investment in legacy applications and databases while adding or migrating to a new set of applications that exploit the internet, e-commerce, extranet, and other technologies.
Relational Database basics
Entities represented in tables Each instance of an entity is a separate row Attributes become columns Relationships built between tables Structured Query Language (SQL) - data manipulation
BI tool
a vendor's software tool used to develop the BI application and deliver one or more BI styles
Data profiling
an essential part of the data quality process; this involves examining source system data for anomalies in values, ranges, frequency, relationships, and other characteristics that could hobble future efforts to analyze it. It enables early detection of problems.
BI styles
different BI application types that a business person may use in performing their analysis e.g. reporting, dashboards, scorecard, OLAP/pivot analysis, ad-hoc query, notifications, statistical analysis, alerting/notifications, data discovery, data visualization, spread-sheets, etc
BI market
referring to just the top layers of the BI architectural stack such as reporting, analytics, and dashboards
Business intelligence
set of methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information used to enable more effective strategic, tactical and operational insights and decision making
Data
set of values of qualitative or quantitive variables; put another way, pieces of data are individual pieces of information
Entity
something that exists and is capable of being described.
Metadata management
support data accessible to the business community throughout an enterprise. In managing metadata, an enterprise needs to understand what the data means, how it was transformed from creation to consumption, and its associated data quality.
Hardware
the machines, wiring, and other physical components of a computer or other electronic system.
Data cleanising
the process of finding and fixing errors, inconsistencies and inaccuracies in data. The level of cleanliness required depends on each industry's best practices. Data quality tools are used for the more complex processing while data integration tools performs basic processing.