Business Intelligence Q's and A's
Describe the subsystems of a Data Lake
Enterprise IT Data Exchange - Provide data movement, preparation, governance Data Lake Repositories - Support a wide range of data and workloads Catalogue - Search for, located, and download data and artefacts from provision sand boxes - Develop data management models, define governance policies, perform impact analysis - Add Additional insight through automated analysis Analytics Engines Raw Data Interaction Self Service - Data scientists need raw data to create analytics suitable for deployment into production systems View Based Interaction Self Service - Citizen analysts need data that is refine for their needs
Compare structure data and big data in terms of perceived value
- Structured data "High known value per byte, high cost per compute" - Big Data "Low known value per byte, low cost per compute"
Describe the MapReduce
- Task tracker, job tracker 1. Application contacts job tracker 2. Master splits computation to slave task trackers 3. Sends finished tasks back to job tracker 4. Job tracker sends computation back to application
What is project and what are common characteristics of a project?
A project is a temporary endeavour undertaken to accomplish a unique purpose. Common characteristics include - a target outcome - defined life span - cross-organizational participation - new or unique - time, cost, and performance requirements
What is a data silo?
A repository of fixed data that remains under the control of one department and is isolated from the rest of the organization.
What is a schema?
A schema is a graphical depiction of the database structure that defines the objects in the database.
What is Waterfall Software Development?
A sequential, non-iterative software design process in which progress is seen as flowing steadily downwards through the phases of: 1) Conception 2) Initiation 3) Analysis 4) Design 5) Construction 6) Testing 7) Production/Implementation 8) Maintenance - testing done at the end, could lead to testing being squeezed if other stages are late
What is an architectural framework?
An architectural framework defines how to create and use an enterprise an enterprise architectures and is needed to enable BI projects, both new and existing, to complement each other and create a cost-effective, cohesive solutions. - should be designed to accommodate expansion and be able to overcome the challenges of the future
What is a Data Integration Architecture?
An architecture with the objective of gathering data from inside and outside the enterprise and transforming it into information the business uses day-to-day and for the future
What is a prototype?
An early sample, model, or release of a product built to test a concept or process or to act as a thing to be replicated or learned from
Using the Good Chart Matrix, describe what makes a good chart.
Axes = design execution, contextual awareness Good charts should have strong contextual awareness - What am I trying to say and to who Good charts also have strong design execution - How well is the chart constructed
What are Pattern Icons?
Can be used in white-boarding sessions and for documenting design decisions in reference architectures and solution specifications
What is scope creep?
Changes, continuous or uncontrolled growth in a project's scope, at any point after project begins
What is the difference between conceptual and data driven information?
Conceptual - visualizing concepts and qualitative information - focus on ideas - simplify and teach Data-Driven - plotting data and information - focus on statistics - inform and enlighten
What are the phases of a project?
Core: - Scoping - Prototyping - Testing Detailed: - Scope and plan - Analysis and definition - Architect and design - Built and test - Implement - Deploy and roll out
What is the difference between an ETL and an ELT architecture?
ETL = Extract, Transform and Load - Involves extracting data from source systems, transforming them in a data integration server and loading them into the target database ELT = Extract, Load and Transform - Involves extracting data from source systems and loading and transforming them into the garget database using the data integration repository and integration services
What makes an effective project manager and an ineffective project manager?
Effective: - Lead by example - Visionary - Technically competent - Decisive - Good communicators - Good motivators - Stands up to top management when necessary - Supports team members - Encourages new ideas Ineffective: - Set bad examples - Are not self-assured - Lack technical expertise - Poor communicators - Poor motivators
What are personas?
Fictional characters created to represent the different user types that might use a service/product
What is the Value Proposition Canvas?
Focuses on matching the value proposition and target segment customer profile to identify a sweet spot
What is the difference between functional and non-functional requirements?
Functional Requirements - what the users will get Non-functional requirements - the desirable qualities of a system and the constraints with which the system will be built - Qualities = the properties and characteristics the system will demonstrate - Constraints = limiting factors that must be taken into account
What are the governance domains and focus areas?
Governance Focus Areas - Risk management - Intellectual property - Export controls - Financial reporting - Privacy Governance Domains - Information governance - IT governance - Security
Name 7 ways Data Science and Machine Learning are changing business
Historic Performance = Prediction Batch Processes = Real Time Structured Data = Unstructured Data Reports and Segments = Data Products Classical Statistics = Machine Learning DIY = crowd sourced Known Unknowns = Unknown unknowns
What are the current trends for thinking visually
Improved visualization - More sophisticated, higher quality data visualization has raised standard Data - proliferation of data requires new way of communicating meaning Participation - everyone's doing it more than before
How do we build trust in Big Data?
Need trust to share and to consumer data: - Need understanding of quality, origin, and ownership of data - Need classification of data to govern and protect it - Need timely, reliable data feeds and results - All built on secure and reliable framework
What is the difference between OLAP and OLTP?
OLAP = - more sophisticated than OLTP - meant to work with aggregated data OLTP = - slow, will get slower - takes considerable processing power and user patience - relational database model was designed for transactional processing and is not the best at solving business queries
Outline the potential stakeholders for a project team.
People involved in or affect by project activities - Project sponsor, project manager, project team, support staff, customers, suppliers, opponents
What is PII?
Personally Identifiable Information is any data that could potentially identify a specific person.
What is MoSCoW?
Pneumonic for prioritizing requirements - Must have - Should have - Could have - Would have
Define project management
Project Management is the application of knowledge, skills, tools and techniques to project activities to meet project requirements
What are Pattern Descriptions?
Provide guidance for specific information supply chains and solutions
What is raw data?
Raw data is unprocessed; a collection of numbers, characters. - the processed data of one stage may be the unprocessed data of the next
What is stream analytics?
Real time data analytics to power intelligent action - Can give an edge over competition by spotting trends faster
Name other types of relevant legislation and describe them
Regulation of Investigatory Powers Act 2000 - framework for controlling the lawful interception of communications Freedom of Information Act - gives general right of access to information held by public authorities for carrying out their functions Investigatory Powers Act 2016 - extends the reach of state surveillance - requires ISPs to store web browsing histories for 12 months and give the police, security services, and official agencies access to the data
What is SPD?
Sensitive Personal Data is data concerning a data subject's racial or ethnic origin, political opinions, religious beliefs, trade union activities, physical or mental health, sexual life, or details of criminal offences.
What is Enterprise Data Discovery?
Simplifies access to wildly diverse information, making it available for instant exploration, regardless of state, giving both the business and IT unprecedented visibility into its value to the organization using : - An intuitive dynamic interface - Natural language processing - Highly flexible architecture - Significant in memory processing power
What are the visual design methods?
Sketch Wireframe - representation of a design Storyboard - lays out major actions that happen as user uses the app Mock-Up
What are the benefits of standards in BI design and development?
Standards in layout, platform and charts brings benefits of user productivity, familiarity, and reusability
What is NoSQL?
Stands for Not Only SQL and is a programming language for managing databases of unconventional types of data.
What is structured data?
Structured data is data that can be organized in a pre-defined record or file and may be stored in a database or spreadsheet.
Describe the systems involved in an information architecture as detailed by Forrester
Systems of Record - Host processes Systems of Automation - Connect the physical world Systems of Engagement - Touch people Systems of Insight - Power digital business
From what systems does BI get its data?
Systems of Record - context data related to the business transactions of the organization Systems of Automation - big data from sensors monitoring an asset or location Systems of Insight - analytics based on historical data collected from multiple sources Systems of Engagement - big data about the activity of individuals
Describe the systems related to a data architecture
Systems of Record - data is captured and updated in operational and transactional applications Systems of Integration - gathers, integrates, and transforms data form SoRs into consistent, conformed, comprehensive, current, and clean information. System of Analytics - provides business information that has been integrated and transformed to BI applications for business analysis
What are the two dimensions of project management?
Technical - Scope, WBS, schedules, resource allocation, baseline budget, status report Sociocultural - Leadership, problem solving, teamwork, negotiation, politics, customer experience
Describe the two types of Metadata
Technical/Structural Metadata - the description of data as it is processed by software tools used to enable software to understand and process data Business Metadata - the description of information from a business perspective
What is data discovery?
The user driven process of searching for patterns or items in a data set.
What is a Blind Zone?
The widening gap between the data available to the organization and the percentage of data that can be processed
Compare and contrast the triple and quadruple constraint triangles
Triple Constraint Triangle = Time, Scope, Cost Quadruple Constraint Triangle = Time, Scope, Cost, Quality - Maybe also Safety
What is unstructured data?
Unstructured data is data that is free from form or organization.
Explain four types of application testing
User Acceptance Test = does application meet criteria? is it usable? Smoke Test = preliminary testing, light load Stress Test = heavy load (150%) to see where system collapses Soak Test = long load, to find issues when running for a long time
What are the 5 V's of Data?
Volume: - refers to the vast amount of data generated every second Variety - refers to the different types of data we can now use Velocity - refers to the speed at which new data is generated and moved Veracity - refers to the messiness or trustworthiness of data Value - data must be able to deliver value to organizations
What is Hadoop?
- A platform specifically built for storing, managing and analyzing big data - Open source - Kicked off the ubiquity of big data analytics - Lets you store data in native format and attain value from it through massive parallel processing - Runs on a cluster of machines - Work is distributed between machines to enable large processing power - Framework of tools - Supports running of applications on big data
What is a BI Working Committee?
- Bridges individual BI projects and steering committee - Tackles cross-project issues
What are the business responsibilities in running a BI project?
- Define what the BI solution will deliver to the business - Identify business value of BI solution - Define requirements
What are the responsibilities for IT in running a BI project?
- Determining data needs for business BI - Data profiling to identify data sources - Data modelling to enable integration and BI - Developing data and information architecture - Physically doing data integration - Designing and developing infrastructure to support this
What is the role of a Project Management Office?
- Establish a uniform organizational approach to systems, processes and procedures - Carry out the relevant configuration management functions - Disseminate project instructions and other information - Collect, retrieve, or chase information required by the project manager - Feed into any program management office that may exist
What are the 5 rules of Enterprise Data Discovery?
- Govern Self-Service for Results without Risk - Blend Diverse Data for Deeper Insights - Equip Yourself with Integrated Search, Navigation, and Analytics - Always have a dialogue with your data - Enrich diverse data and keep on discovering
What is a BI Steering Committee?
- Guides and supports project or program - Made up of business and IT - Discusses issues and concerns - Helps resolve issues and enlists support
What comprises the Hadoop Architecture?
- MapReduce - Hadoop File System - Projects
What is information governance?
- Understanding of possess information - Confidence to share and reuse information - Protection from unauthorized use of information - Monitoring of activity around the information - Implementation of key business processes that manage information - Tracking the provision of information - Management of the growth and distribution of information
Idea Illustration
- declarative and conceptual - simple, metaphorical visualization type "consultants corner"
Idea Generation
- exploratory and conceptual - complex and undefined visualizations - metaphorical, creative visualization type "whiteboard/napkin"
What are four new sources of Big Data that are adding value?
1) Clickstream (web, mobile, app browsing) - linking online and offline sales and browsing behaviour 2) Natural language processing and sentiment analysis techniques - enabling the capture and utilization of unstructured data 3) Geolocation and Microlocation 4) IoT/Smart Homes - will give new data on how user interacts with home
What are the four types of chart?
1) Comparison 2) Distribution 3) Relationship 4) Composition
What is Dunnhumby's blueprint for business success?
1) Data 2) Insight 3) Action 4) Changes in Consumer Behaviour
Describe the stages of Data Integration
1) Data Preparation - Gathering, reformatting, consolidating, transforming and cleansing data in staging areas and the data warehouse 2) Data Franchising - Taking data from the data warehouse and making it available in convenient and understandable form for business analytics
What happens when we see a chart? What should we keep in mind for producing charts?
1) We don't go in order - we look visually and scan for contextual clues • we should provide clues to imply meaning 2) We see what stands out first - our eyes go directly to contrasts, unique aspects, clusters, outliers, etc • whatever stands out should match the idea that is being conveyed 3) We only see a few things at once - the more data that is plotted, that more singular of an idea it conveys (trees = forest) • if we need to focus on individual data points, as few as possible should be plotted 4) We seek meaning and make connections - our minds try to assign meaning to a visual and make casual connections regardless if they exist • if visual elements are presented together, they should be related in a meaningful way to prevent false narratives 5) We rely on conventions and metaphors - we use learned shortcuts to assign meaning to visual cues on the basis of common expectations • embrace ingrained conventions to prevent confusion
What is a data architecture?
A data architecture defines the data along with the schemas, integration, transformations, storage, and workflow required to enable the analytical requirements of the information architecture.
What is a Work Breakdown Structure?
A hierarchal decomposition of the total scope of work to be carrier out by the project team to accomplish the project objectives and create the required deliverables
What is PROCESS?
A pneumonic for the interview process for gathering requirements - Plan the interview - Rehearse the Interview - Open the Interview - Collect the Data - End the Interview - Summarize the Interview - Synthesize what's known and not known
What is Agile Software Development?
A set of principles for software development under which requirements and solutions evolve through the collaborative effort of self organizing, cross-functional teams. It advocates adaptive planning, evolutionary development, early delivery, continuous improvement, and encourages rapid and flexible responses to change Key = working more closely with the business, wider range of skills, can adapt as requirements change - testing is done throughout
What are relational databases?
A set of tables with relationships built between them. Entities are represented in tables, attributes are columns. Each instance of an entity is a separate row.
What is a Project Definition Workshop?
A structured 1/2, 1, or 2 day meeting with decision makers - Agree the project's goals, objectives and scope - Define the project's key tasks, structure, management, control mechanisms - Identify the risks, issues, assumptions, probabilities, and impacts - Commit the main players to the project and the key decisions - Impel follow on actions & impart momentum to the project - Agree the completion criteria and deliverables - Results in a signed off Project Definition Report
What is machine learning?
A type of AI, powered by Big Data, that provides computers with the ability to learn without being explicitly programmed
Explain the Expanded DIKW Pyramid
At the base we have data. We put data into context to derive information. We analyze information and draw inferences to develop understanding. We apply professional judgements to understanding to gain knowledge. We apply decision making tools and processes to knowledge to arrive at actionable decisions.
Describe the BI Tool Components. What is the purpose of each component?
BI Manager - Application controller that orchestrates processes and interfaces with the BI repository and its supporting metadata BI Repository - Stores the metadata used by the application - E.g. screen layouts, data definitions, filters, annotations, workflow, version control, Access Control Lists Data Access - Brings data into memory so BI tools can work with it Data Transformation - Allows apps to transform data based on business' analytical needs Presentation & Analysis - The component the business interacts with to view, select, and analyze data
Describe BI Application Testing
Bi App Developed 1. Developer Unit Tests 2. User Unit Tests Bi Applications 3. Developer Integration Tests 4. User Acceptance Testing Bi System 5. Systems and Performance Test
What is Business Intelligence?
Business Intelligence is a set of methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information used to enable more tactical and strategic decisions and allow for operational insights.
Describe the Stepwise Refinement of Requirements
Business Requirements to Data Requirements, Functional Requirements, Technical Requirements, Regulatory & Compliance Requirements to BI requirements
How do businesses create data? What are the three types of business processes?
Businesses create and use data through business processes which are a collection of related, structured activities or tasks that product a specific service or product or serve a particular goal for particular customer(s). 3 Types of Processes Management Processes - the processes that govern the operation of a system Operational Processes - processes that constitute the core business and create the primary value stream Supporting Processes - support the core processes
What are the components needed for a system of insight?
Data - Ideally placed directly into the flow of data Cloud - All systems of engagement are born in the cloud Watson & Cognitive system - An engine that drives the System of Insight
Describe the core building blocks of Business Intellignece
Data Integration - integrating and cleansing data from multiple sources Data Warehousing - storing integrated data Business Intelligence - presenting and analyzing information
What is data?
Data is a set of value of qualitative or quantitative variables. "Individual pieces of information"
Describe Defense in Depth
Defense in depth is the basis for a good security approach based on - Defense beyond secure zone - Multiple layers of security - Different forms of defense - Protection against a weakness in any one layer - Each layer can protect against different types of attack - Weakening the enemy incrementally - Encouraging attacks on points of strength
What are Pattern Names?
Define a vocabulary to discuss issues and technology options related to information architecture, governance and management
What is the product architecture?
Defines the products, their configurations, and how they are interconnected to implement the technology requirements of the business intelligence framework.
What is a technical architecture?
Defines the technologies that are used to implement and support a BI solution that fulfills the data and information architecture requirements Includes: design, development, testing, deployment, maintenance, support, etc.
What are the components of the Product Life Cycle?
Defining - Goals, specifications, tasks, responsibilities Planning - Scheduling, budgets, resources, risks, staffing Executing - Status reports, changes, quality, forecasts Delivering - Handing over to production, closing project and documenting lessons
What are BI Applications?
Deliverables built by the BI team for business people to use in their analysis
What is meant by a deliverable?
Deliverables is something that the project team delivers, e.g. report, dashboard, OLAP cube, model, visualization, etc.
Describe the various life cycles of information and their characteristics
Information Asset - Slowly changing - Highly duplicated - Multiple formats Information Code - Set based information - Sets change infrequently, individually rarely change - Many representations, widely distributed and mapped Information Activity - Rapid change during active phase - Then little to no change - Limited distribution while active Information Event - No change once created - Distributed as required
Explain the DIKW Pyramid
Information is defined in terms of data, knowledge in terms of information, and wisdom in terms of knowledge
What is Big Data?
Information that cannot be processed or analyzed using traditional processes or tools
Describe the two types of complexity
Inherent - Exists in the business challenges and technology used Induced - Extra functionality because we extrapolate requirements to what we think the business or users want
What is the difference between manual and automatic metadata capture?
Manual metadata capture - User manually defines metadata after the event - Could be done too late or inaccurately Automatic metadata capture - Metadata is created by the application at the same time that the data is created in a standardized format - automated
What is the Gartner Technology Hype Cycle?
Measures the hyperbolic cloud surrounding technologies and charts how products progress from a peak of inflated expectations through a trough of disillusionment to an upward slope of enlightenment.
What is the GDPR?
The General Data Protection Regulation is an EU wide regulation that comes into force on May 25, 2018, bringing new changes to: Fines - increased from 500K to 20M or 4% of annual global turnover Accountability - proof of compliance - appointment of DPO? Breach notifications - report within 72 hours - tell individuals Right to Erasure - users can request the removal of personal data Right to Portability - users can obtain they personal data and reuse it as they wish
What is the Project Management Body of Knowledge, and what are its knowledge areas?
The Project Management Body of Knowledge (PMBOK) is a set of standard terminology and guidelines for project management 9 knowledge areas: - Core Function Areas - Time management - Cost management - Scope management - Quality management Facilitating Function Areas - HR management - Procurement management - Communication management - Risk management Knowledge Function - Integration management
What is the UK DPA?
The UK Data Protection Act is a regulation passed in 1998, enforced by the ICO, that states that everyone responsible for using data has to follow strict rules to make sure information is: - used fairly and lawfully - used for limited, specifically stated purposes - used in a way that is adequate, relevant and not excessive - accurate - kept for no longer than is absolutely necessary - handled according to people's rights - kept safe and secure - not transferred outside of the EEA without adequate protection
What is CRUD?
The basic operations of a database comprising of Create, Read, Update and Delete.
What is MetaData
The description of data as it is created, transformed, stored, accessed, and consumed in the enterprise.
What is Customer Data Science?
The intersection of Computer Science, Maths & Stats, and Customer Context
What is meant by denormalizing data?
The process of trying to improve the read performance of a database, at the expense of losing some write performance, by adding redundant copies or grouping data
What is Big Data
The proliferation of largely unstructured data that cannot be processed or analyzed using traditional processes or tools.
What is a database schema?
The structure of the database described in a formal language supported by the DBMS
What is a Program?
- A set of related projects - Often complex and large in nature - Often intended to make major transformational change - Manages cross-project dependencies - Often uses a program management office
How do we access information, name 4 different ways.
- Access in Place - Make a Local Copy - Share Master Solution Pattern - Centralized Master Solution Pattern
Outline a BI Project Team
- BI Steering Team - Business Advisor, BI/DW Manager, BI/DW Advisor - Data Integration, Architecture, Business Analysis, BI Development, QA Testing, IT Operations, Business BI Stakeholders
What are the different types of applications?
- Bespoke/Custom= applications developed in house - Package = applications bought as a suite from a vendor - Cloud = SaaS running in the cloud (may be bespoke or package)
What are the advantages of project management?
- Better control of financial, physical, and people resources - Improved customer relations - Shorter development times - Lower costs - Higher quality and increased reliability - Higher profit margins - Improved productivity - Better internal coordination
What are the core functions of a BI Project Development Team?
- Business Analysis - Architecture - Data Integration - BI Development
What are the layers of an architectural framework?
- Data Architecture - Technical Architecture - Product Architecture - Information Architecture
Describe the Hadoop File System
- Data Node, Name Node - Name Node = Keeps an index of what data resides on which data node 1. Application requests data node 2. Name node contacts data node 3. Data node sends data directly to application
Name three types of outsourcing contracts
- Time and supplies - Fixed price - Incentives
Explain the relationship between business and structural/technical metadata
- Business Metadata describes the data that the business needs, what it means, and how it should be classified and protected - Structural Metadata describes how data is actually stored and labelled in the data store - An integration engine copying data into a sandbox can discover fields that the business classifies as sensitive and mask these values dynamically
What are the suggested skills for project managers?
- Communication Skills: listening, persuading - Organizational Skills: planning, goal-setting, analyzing - Team-Building Skills: understanding, motivating - Leadership Skills: sets example, energetic, vision, delegates, positive - Coping Skills: flexibility, creativity, patience, persistence - Technological Skills: expertise, knowledge
How do you set up a governance program?
- Information governance should start small and focus on most important information, then expand out as it demonstrates worth - Must evolve with the business and seek to educate people in appropriate information management - Needs senior stakeholders and visible consequences for those who ignore requirements - Make authoritative sources clear - Make them visible - Make it easy and unavoidable to do the right thing - Should ensure: data protection, data quality, system standards, information lifecycle, proper employee operation
What is the difference between IaaS, PaaS and SaaS
- Infrastructure as a Service - Platform as a Service - Software as a Service
Outline the project management processes
- Initiating processes - Planning processes - Executing and implementing processes - Monitoring and controlling processes - Closing processes
Who are the users supported by a Data Lake?
- Line of business teams - Information curator - Data lake operations - Analytics team - Governance, risk and compliance team
What are the challenges of project management?
- Managing temporary, non-repetitive activities and often acting independent of the organization - Getting the right people and right time to address the right issues and make the right decisions
Outline a few drivers of complexity
- Number of subcontract suppliers - Contract complexity - Customer complexity - Leading edge or emerging technology - Application development complexity - Technical Diversity - Size of Project
What is the BI Program Management Office?
- Oversees cross-functional projects - Aids in prioritizing projects, managing funding, communicating with stakeholders, measuring RoI - Sponsors overall strategy, architecture - Supports program manager
What are the different models for building a data asset?
- Own - Procure - Partner
Outline the Project Roles and how they are interrelated
- Performing Organization - Client sponsor - Project Manager - Project Team - Stakeholders
What are the goals of information management
- Provide appropriate views of information at he right time, place and format for an individual's needs - Reduce information duplication through sharing - Enforce common standards for data structures, data semantics, data quality - Synchronize information between systems in an optimal and timely manner - Measure, monitor, and protect information provision - Actively manage issues and their consequences - Identify candidates for system consolidation
What is a Pattern Language?
- Provides a foundation for setting architectural standards and reference architects - Provides material for learning and training in information architecture
Draw the design authority diagram
- Pyramid with Chief in center - Application, Business, Data, Technical and ends
What are the characteristics of Agile Software Development?
- customer satisfaction through early and continuous delivery of valuable software - software deliver frequently (weeks not months) - close, daily cooperation between business people and developers - sustainable development while maintaining constant pace - simplicity is essential - continuous attention to technical excellence and good design - projects built around trustworthy, motivated individuals
Everyday Data Visualization
- declarative and data driven - simple, low volume, clear - conventional chart, static visualization type "simple, factual charts"
Visual Discovery
- exploratory and data driven - big data, complex and dynamic - advanced and unconventional visualization type "data scientists trend spotting and deep analysis"
What are the components of Application Specification?
- name/identifier - description - category - business process supported by application - data (sources, names, status) - business transformations for application (style, transformations for analysis, filters, algorithms, rules to be used on input data) - business users of application (owner, contact, developer, business priority) - estimates, costs needed to deliver (resource estimates, assumptions, risks, dependencies, issues, feedback)
What is Dunnhumby?
- world's leading customer science firm - building loyalty in a disloyal world since 1989 - affecting 1 billion customers by 2020
What are the four layers of a technical architecture?
1) Data Sources - enterprise applications, cloud applications, business processes 2) Data Integration - ETL/ELT, Data Virtualization, Master Data Management 3) Data Warehouse and BI Data Store - data mart, BI Repos, OLAP cubes 4) Business Intelligence - reports, dashboards, data visualization
What are the three categories of purpose for visualization?
1) Declarative - justifying a statement 2) Confirmatory - seeking to prove a hypothesis 3) Exploratory - mining to discover insight
What are the different levels of data?
1) Essential Data - core sales, product, and stores data captured and used for reporting 2) Advanced Data - capturing and utilizing data beyond core management (customer, digital, competitor data) 3) Joined Data - data joined using customer, product, store keys to allow for deeper analysis of behaviours and performance 4) Enriched Data - data joined and enriched (segmentations, scoring) continuously - enrichment used for reporting, analytics, and activation 5) Data Partnerships - enriched data used to drive a self-sustaining commercial relationship ecosystem - enables consumption of new data and monetization of existing organizational data
Describe the steps of Data Preparation
1) Gather - Extract data from source systems 2) Reformat the Data - Convert the data to a common format and schema to be fed into the data warehouse 3) Consolidate, Standardize, and Validate Data - Provide a single, consistent definition for business users and validate with metadata 4) Transform Data - Business transformations turn data into business information - Applying business rules, algorithms, fillers to put data into a business context 5) Cleanse Data - Involves a more sophisticated analysis (customer householding, name/address checking), as simpler data quality checking has already been done (reformatting, transforming, validating) 6) Store Data - Store transformed and cleansed data in the data warehouse to be made available for further processing
Describe the steps of Data Franchising
1) Gather, Filter, and Subset Data from the Data Warehouse - Assume data preparation has already happened, and place data in temporary store, selecting the rows and columns that are needed 2) Restructure or Denormalize Data - Restructure the data to fit target schemas 3) Transformations and Calculations - Perform business transformations and metric calculations used by the specific business processes whose marts or cubes you are building 4) Aggregate or Summarize Data - Summarizing or aggregating data (parent-child fields) to improve response time 5) Store Data in Data Mart or Cube
What are the 2 questions about the nature and purpose of data visualization?
1) Is the information conceptual or data-driven? - identifies what you have 2) Am I declaring something or exploring something? - identifies what you're doing
Name four types of databases to be considered for the technical architecture
1) Massively Parallel Processing Arrangement (MPP) 2) In-Database Analytics 3) In-Memory Analytics 4) Cloud Computing
What are the 3 areas to focus on in the world of big data?
1) Organizational Data Capability and what it enables 2) Building High Value Data Assets 3) Privacy, GDPR, and the future of the Data Marketplace
Describe 5 processes that can be made on an OLAP cube
1) Slicing - The act of picking a rectangular subset of a cube by choosing a single value for one of its dimensions - Choosing a single value for one dimension and creating a cube with one fewer dimensions 2) Dicing - Produces a sub cube by allowing the user to pick specific values of multiple dimensions 3) Drilling Up/Down - Expanding on a parent field to examine child fields, or minimizing child fields to view a parent field 4) Roll Out - Summarizing data along a dimension either through computing values or applying a formula 5) Pivoting - Rotating the cube to provide different perspectives on data
Why are Systems of Record and the Enterprise Data Warehouse separate?
1) Systems of Record are made for data capture and processing transactions rather than for reporting and analysis 2) Data across Systems of Record is often inconsistent 3) Data quality within and across Systems or Record is a challenge addressed by Enterprise Data Warehouses 4) Difficult to access all the Systems of Record in real time
What is agent based modelling?
1) population of agents in model universe have attributes which are representative of customers 2) the agent environment including as many relevant aspects of the retail market as possible 3) Then we simulate how customers behave in the face of changes to their retail environment
What are the 6 main customer types in lifestyles?
1) price sensitive 2) traditional 3) mainstream 4) kids' choice 5) convenience 6) finer foods
What are the Data Lake steps for Analytics Development
1. Advertise 2. Catalog 3. Discover 4. Provision 5. Explore 6. Deploy
Describe the BI Application Development Lifecycle
1. Data Content Prototype 2. Data Visualizations Prototype 3. BI Application Prototype 4. BI Application
What are the steps for prototyping?
1. Define prototype scope 2. Create prototype 3. Conduct test unit 4. Users test prototype 5. Gather user feedback 6. Revise prototype 7. Prototype Sign-Off
What are the 15 Project Management Job Functions?
1. Define scope of project 2. Identify stakeholders, decision makers, escalation procedures 3. Develop detailed task list (work breakdown structures) 4. Estimate Time Requirements 5. Develop Initial Project Management Flowchart 6. Identify Resources and Budget 7. Evaluate Project Requirements 8. Identify and Evaluate Risks 9. Prepare Contingency Plan 10. Identify Interdependencies 11. Identify and Track Critical Milestones 12. Participate in Project Phase Reviews 13. Secure Needed Resources 14. Manage the Change Control of Process 15. Report Project Status
Describe the Analytics Lifecycle
1. Discovery - Locate Data 2. Exploration - Build Analytical Models 3. Deployment - Run Analytical Models
Benefits of Hadoop
1. Hadoop has built in fault tolerance - 3 copies of each data on different nodes - Tasks can also be done by other slaves - Enterprise version of Hadoop has a backup master 2. Allows easy programming - Don't have to know where files are located - Don't have to manage failure - Don't have to worry about scalability 3. Highly scalable - Just add computers 4. Applications provide greater functionality "By 2015, 50% of enterprise data will be processed on Hadoop" - yahoo
What are the Data Lake steps for Data Distribution
1. Provision 2. Catalog 3. Access 4. Distribute
Outline the process of building a BI Case
1. Review the Organization's Business Initiatives 2. Enlist a BI sponsor 3. Connect with BI Stakeholders 4. Identify the Business Processes Affected 5. Identify the Business Benefits 6. Build the Technical Case 7. Select Products 8. Choose Infrastructure Platforms 9. Assess Organization's Readiness 10. Apply Realistic Expectations 11. Run a Project Definition Workshop 12. Safeguard against Scope Creep 13. Manage Risks using a Risk Register 14. Define Requirements 15. Stepwise Refinement of Requirements 16. Gather Requirements through Interviews 17. Prioritize Requirements 18. Degree and Agree Cost of Project
What is needed in an organization for defense in depth. Describe each briefly.
1. Security Engineering - Firewalls, proxies, secure email systems, remote access systems, intrusion detection systems 2. Identity and Access Management - Create user accounts, remove user accounts, allocate privilege, data classification, revalidation of user accounts 3. Logging and Monitoring - Log collection of everything, identify unusual activity, generate alerts, validate alerts, watch the security organization and other super users 4. Security Operations - Implement separation of duties - Action team has privilege but not wide visibility - Run day-to-day security processes - Respond to alerts - Set up user accounts - Set up privileges - Incident management 5. Security Architects - Ensure solutions are in line with security policy - Advice and guidance to the rest of the organization - Focused on infrastructure, networks, platform 6. Application Security - Ensure application development is in line with security policy - Own any in-house security software solutions - Advice an guidance to application development team 7. Security Compliance - Check security requirements are met - Patches - Vulnerabilities - Least privilege - Guidance from architects - Any other policy requirement 8. Risk Management - Understand holes, issues - Rate risks, communicate risks - Prioritize resolution - Ensure risks are owned accepted and remediated 9. Physical Security - Security monitoring - Executive protection - Secure transportation - Check physical security policies are implemented 10. Data Protection - Regulatory requirement - Ensure rules are enforced - Understand rules of PII and SPD - Provide evidence to regulators 11. Internal Audit - Check correct operations - Provide evidence to executives - Spot problems before they become external 12. Audit Response - Provide external/internal audit with required information - Ensure all security-related regulations are in place Potentially: 13. Disaster Recovery 14. Business Continuity 15. Metrics and Reporting
What is needed for a defense in depth system?
1. Security Policy - Covers basic security requirements 2. Technical Specification - Specifies on how to implement policy in different circumstances - Settings on system - Design constraints - Processes and procedures required 3. An organization to support these
Outline the differences between entities and attributes
An entity is something that exists and is capable of being described. It is a noun about which an organization maintains facts. An attribute is a characteristic of an entity that identifies, describes the entity, or relates an entity to another. Attributes can be thought of as adjectives and should be atomic (unable to be decomposed).
What are the 3 interlocking life cycles of information governance?
Definition - Managing policies, rules and classifications - Cycle: Define, Roll-out, Monitor, Audit Operations - Managing and curating information sources and business terms - Cycle: Detect, Remediate, Classify, Execute Development - Managing common information needs and related rule implementation - Cycle: Design, Develop, Deploy Metadata at center of three rings
How do digital channels interact with the Data Lake?
Digital channels from the enterprise IT interact with the data lake through the enterprise IT data exchange which exchanges data and insight to a data lake.
What are the 5 C's of Data?
In order for a BI program to deliver actionable information to business users, data must be: Clear - free from errors, missing items and invalid entries that would wreak havoc on an automated system Consistent - there should be no disagreement about which version of data is the correct one Conformed - the business needs to analyze data across shareable, common dimensions - data must conform to the standards set by the business Current - business needs to base decision on whatever level of data currency is necessary Comprehensive - business should have all the data necessary to perform operations
What is OLAP?
OLAP presents the user with information through a multi-dimensional cube rather than data. Makes it easy for users to identify trends or patterns to solve business questions (high level)
What is OLTP?
OLTP stands for On-Line Transactional Processing and is a form of processing that covers applications that work with transactional or atomic data. - Typically OLTP applications gather groups of records and present them to the user.
What are the pros and cons of security outsourcing?
Pros: ○ If successful, partner has the skills you lack ○ Requirements can be specified in a contract ○ You know the up front cost Cons: ○ How can you check the outsourcer is doing things correct? ○ Anything not included in the contract will cost a lot later ○ Outsourcers have problems of their own
What is SQL?
SQL (Sequel or Standard Query Language) is a special purpose programming language designed for managing data held in a relational database management system or for stream processing in a relational data stream management system.
What are the three classes of decision making that BI Supports?
Strategic Decisions - long term consequences - broad implications - wide effect on company - made fairly infrequently Tactical Decisions - less widespread consequences - made on a more frequent basis (weekly, monthly) - historically BI initiatives have focused on these decision makers Operational Decisions - more detailed din nature - may affect fewer people - daily
What is microlocationing and how is it enabled?
The ability to accurately pinpoint the location of a customer in a store to allow personalize greetings, offer discounts based on location. Enabled through the 14+ sensors on a phone - accelerometers, gyroscope, barometer, proxy sensor, light sensor, bluetooth, wifi, touch screen ,gps, nfc, camera, etc.