400 MIDTERM

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

what steps can an organization take to ensure the security and confidentiality of customer data in its data warehouse

- Establish effective corporate and security policies and procedure - Implementing logical security procedures and techniques to restrict access - Limiting physical access to the data center environment - Establishing an effective internal control review process with an emphasis on security and privacy

what are the main components of a business reporting system?

One is the online transaction processing system (ERP, POS, etc.) that records transactions.second is a data supply that takes recorded events and transactions and delivers them to the reporting system.Next comes an ETL component that ensures quality and performs necessary transformations prior to loading the data into a data storeThen there is the data storage itself (such as a data warehouse)

dashboard

A visual presentation of critical data for executives to view. It allows executives to see hot spots in seconds and explore the situation.

Why is the ETL process so important for data warehousing efforts?

ETL tools are the first essential step in the data warehousing process that eventually lets you make more informed decisions in less time

Database Management System (DBMS)

(DBMS) Software for establishing, updating, and querying a database

business analytics

the application of analytics to business problems/data

analytics

the science of analysis

analytics ready

a state of preparedness for analytics projects, especially as it relates to data acquisition and preparedness

correlation

a statistical measure that indicates the extent to which two factors vary together and thus how well one factor can be predicted from the other. Correlations can be positive or negative.

algorithm

a step-by-step search in which improvement is made at every step until the best solution is found

data mining

A process that uses statistical, mathematical, artificial intelligence, and machine-learning techniques to extract and identify useful information and subsequent knowledge from large databases

what are the differences and commonalities between dashboards and scorecards?

Dashboards offer a broad way to track strategic goals and measure a company's overall efficiency. Scorecards, on the other hand, provide a quick and concise way to measure KPIs and give a clear indication of how well organizations are working to achieve their targets.

define data mining? why are there many different names and definitions for it?

Data mining is the process that helps in extracting information from a given data set to identify trends, patterns, and useful data. The objective of using data mining is to make data-supported decisions from enormous data sets

what are the major data mining processes?

Data mining process includes business understanding, Data Understanding, Data Preparation, Modelling, Evolution, Deployment. Important Data mining techniques are Classification, clustering, Regression, Association rules, Outer detection, Sequential Patterns, and prediction.

what are the main data preprocessing steps?

Data quality assessment. Data cleaning. Data transformation. Data reduction.

ordinal data

Data that contain codes assigned to objects or events as labels that also represent the rank order among them. For example, the variable credit score can be generally categorized as (1) low, (2) medium, and (3) high.

structured data

Data that is formatted (often into tables with rows and columns) for computers to easily understand and process.

categorical data

Data that represent the labels of multiple classes used to divide a variable into specific groups.

What is data visualization? Why is it needed?

Data visualization helps to tell stories by curating data into a form easier to understand, highlighting the trends and outliers. A good visualization tells a story, removing the noise from data and highlighting the useful information

describe the data warehousing process

Data warehousing is a process used to collect and manage data from multiple sources into a centralized repository to drive actionable business insights. With all your data in one place, it becomes simpler to perform analysis and reporting at different aggregate levels

What recent technologies may shape the future of data warehousing, why?

Sourcing: Web/Social Media/Big Data, Open source software, Software as a service, cloud computingInfrastructure: Columnar, Real-time data warehousing, DW appliances, data management technologies and practices, In-database processing technology, In-memory storage technology, New database management systems, advanced analytics

describe the three steps of ETL

Extraction: selecting data from one or more sources and reading the selected data. Transformation: converting data from their original form to whatever form the DW needs. This step often also includes cleansing of the data to remove as many errors as possible. Load: putting the converted (transformed) data into the DW

what skills should a DWA possess? why

Familiar with high-performance software, hardware, and networking technologies. Possess solid business insight, decision-making processes and communication skills

what is a business report? what are the main characteristics of a good business report? why is it needed?

Information included in the document should be accurate, relevant and informative to its readers. These are important characteristics of good reports. When reading a report to gain a deeper understanding of an issue, a businessperson shouldn't have to sift through paragraphs of filler content

why has information visualization become a centerpiece in BI and business analytics? is there a difference between information visualization and visual analytics ?

Information visualization is aimed at answering "what happened" and "what is happening" and closely associated with business intelligence, visual analytics is aimed at answering " why is it happening," "what is more likely to happen," and is usually associated with business analytics. Offers rapid understanding, essential for difficult information.

what is the difference between information visualization and visual analytics?

Information visualization typically focuses on abstract data, that is, data without any agreed-upon depiction, such as financial data, text, statistics, databases, and software. Information visualization is aimed at answering "what happened" and "what is happening" and closely associated with business intelligence, Visual analytics emphasizes analytical reasoning about data and combines computational analysis techniques with interactive visualizations. Visual analytics is aimed at answering " why is it happening," "what is more likely to happen," and is usually associated with business analytics.

data integration

Integration that comprises three major processes: data access, data federation, and change capture. when these three processes are correctly implemented, data can be accessed and made accessible to an array in ETL, analysis tools, and data warehousing environments

why is the original/raw data not readily usable by analytics tasks?

It is usually dirty, misaligned, overly complex and inaccurate. Data processing is necessary to convert raw data into analytics ready refined-data

what was the primary difference between systems called MIS, DSS and ESS?

MIS (management information systems) = related to the managing the internal operations and the documents. DSS (decision support system) = helps employees in making decisions even for the daily tasks. EIS (executive information system) = assists the senior level managers in making serious decisions that are very important and critical to make

what is a report? what are reports used for?

Reports are documents designed to record and convey information to the reader. Reports are part of any business or organization; from credit reports to police reports, they serve to document specific information for specific audiences, goals, or functions.

what are the main categories of data? what types of data can we use for BI and analytics?

The main categories of data are structured data and unstructured data. Both of these types of data can be used for business intelligence and analytics, although it is easier and more expedient to use structured data

List the benefits of data warehouses.

Saves Time. ... Improves Data Quality. ... Improves Business Intelligence. ... Leads to Data Standardization and Consistency. ... Enhances Return on Investment (ROI) ... Stores Historical Data. ... Increases Data Security.

what is regression and what statistical purpose does it serve?

attempts to determine the strength and character of the relationship between one dependent variable (usually denoted by Y) and a series of other variables (known as independent variables)

what are the tools used in descriptive analytics?

line graphs, pie charts, box & whiskers, scatter plot

data richness

means that all the required data elements are included in the data set

list and describe the three major categories of business reports

metric management reports:involve outcome-oriented metrics based on service level agreements and/or key performance indicators. dashboard-type reports:present a range of performance indicators on one page, with both static/predefined elements and customizable widgets and views. balanced scorecard-type reports:present an integrated view of a company's health and include financial, customer, business process, and learning/growth perspectives.

what does it mean to clean/scrub the data? what activities are performed in this phase?

the process of fixing incorrect, incomplete, duplicate or otherwise erroneous data in a data set. It involves identifying data errors and then changing, updating or removing data to correct them

OTLP

transaction system that is primarily responsible for capturing and storing data related to day-to-day business functions

Online Transaction Processing (OLTP)

transaction system that is primarily responsible for capturing and storing data related to day-to-day business functions

Interval Data

variables that can be measured on interval scales

what are the characteristics of big data?

volume, value, variety, velocity, and veracity.

visual analytics

an extension of data/information visualization that includes not only descriptive but also predictive analytics

business analyst

an individual whose job is to analyze business processes and the support they receive (or need) form information technology

big data analytics

application of analytics methods and tools to Big Data

Decision or normative analytics

also called prescriptive analytics, is a type of analytics modeling that aims at identifying the best possible decision from a large set of alternatives

data mart

A departmental data warehouse that stores only relevant data

list and describe the major components of BI

BI systems have four major components: the data warehouse (with its source data), business analytics (a collection of tools for manipulating, mining, and analyzing the data in the data warehouse), business performance management (for monitoring and analyzing performance), and the user interface (e.g., a dashboard).

what are the best practices in business reporting? how can we make our reports stand out?

Be Strategist With Your Reporting. Be Consistent In Your Reports. Simplify The Data Being Collected. Coordinate With Team Members To Ensure Consistency of Data. Set Milestones. Keep Database Records. Reassess and Reevaluate Internal Reporting Practices. Tell a story.

when developing a successful data warehouse, what are the most important risks and issues to consider and potentially avoid?

- Starting with the wrong sponsorship chain - Setting expectations that you cannot meet - Engaging in politically naïve behavior - Loading the warehouse with information just because it is available - Believing that data warehousing database design is the same as transactional database design - Choosing a data warehouse manager who is technology oriented rather than user oriented - Focusing on traditional internal record-oriented data and ignoring the value of external data and of text, images, and, perhaps, sound and video - Delivering data with overlapping and confusing definitions - Believing promises of performance, capacity, and scalability - Believing that your problems are over when the data warehouse is up and running - Focusing on ad hoc data mining and periodic reporting instead of alerts

in your opinion, what are the top three data-related challenges for better analytics?

- poor quality -too much data -visual representation

what are the main data preprocessing steps? list and explain their importance in analytics

-data cleaning: Data cleaning is the process to remove incorrect data, incomplete data and inaccurate data from the datasets, and it also replaces the missing values -data integration: The process of combining multiple sources into a single dataset -data reduction: helps in the reduction of the volume of the data -data transformation: The change made in the format or the structure of the data is called data transformation

discuss the major issues with implementing BI

-expensive -not looking at right data -not implemented -no mobile base

Discuss the major issues in implementing BI

1)Too expensive and hard to justify the ROI of BI. ... 2) Lack of company-wide adoption. ... 3) Analyzing data from different data sources. ... 4) Businesses aren't measuring the right indicators. ... 5) Delivering mobile-based BI is no easy feat. ... 6) Providing true self-service analytics

what is an information dashboard? why are they so popular? what do they present?

A data dashboard is an information management tool used to track, analyze, and display key performance indicators, metrics, and data points. You can use a dashboard to monitor the overall health of your business, department, or a specific process. Dashboards are one of the most popular capabilities of BI platforms because they present easily understandable data analysis, allow you to customize which information you want to view, and provide a way to share the results of your analysis with others

data integrity

A part of data quality where the accuracy of the data (as a whole) is maintained during any operation (such as transfer, storage, or retrieval)

what is a business performance management? how does it relate to BI?

A performance management system is a mechanism for tracking the performance of employees consistently and measurably. Performance Management uses data collection to evaluate and improve the process and methodologies of an organization. BI helps companies react to situations they've uncovered through analytics

data warehouse administrator DWA

A person responsible for the administration and management of a data warehouse

data warehouse

A physical repository where relational data are specially organized to provide enterprise-wide, cleansed data in a standardized format

data preprocessing

A tedious process of converting raw data into an analytic ready state.

Nominal Data

A type of data that contains measurements of simple codes assigned to objects as labels, which are not measurements. For example, the variable marital status can be generally categorized as (1) single, (2) married, and (3) divorced.

Numeric Data

A type of data that represents the numeric values of specific variables. Examples of numerically valued variables include age, number of children, total household income (in U.S. dollars), travel distance (in miles), and temperature (in Fahrenheit degrees).

operational data store

A type of database often used as an interim area for a data warehouse, especially for customer information files.

describe the major components of a data warehouse

A typical data warehouse has four main components: a central database, ETL (extract, transform, load) tools, metadata, and access tools. All of these components are engineered for speed so that you can get results quickly and analyze data on the fly. 1. Central database: A database serves as the foundation of your data warehouse. Traditionally, these have been standard relational databases running on premise or in the cloud. But because of Big Data, the need for true, real-time performance, and a drastic reduction in the cost of RAM, in-memory databases are rapidly gaining in popularity. 2. Data integration: Data is pulled from source systems and modified to align the information for rapid analytical consumption using a variety of data integration approaches such as ETL (extract, transform, load) and ELT as well as real-time data replication, bulk-load processing, data transformation, and data quality and enrichment services. 3. Metadata: Metadata is data about your data. It specifies the source, usage, values, and other features of the data sets in your data warehouse. There is business metadata, which adds context to your data, and technical metadata, which describes how to access data - including where it resides and how it is structured. 4. Data warehouse access tools: Access tools allow users to interact with the data in your data warehouse. Examples of access tools nclude: query and reporting tools, application development tools, data mining tools, and OLAP tools.

differentiate among a DM, an ODS, and an EDW

An ODS (Operational Data Store) is the database from which a business operates on an ongoing basis.Both an EDW and a data mart (DM) are data warehouses. An EDW (Enterprise Data Warehouse) is an all-encompassing DW that covers all subject areas of interest to the entire organization. A data mart is a smaller DW designed around one problem, organizational function, topic, or other suitable focus area.

Online Analytical Processing (OLAP)

An information system that enables the user, while at a PC, to query the system, conduct an analysis, and so on. The result is generated in seconds.

What is an ODS?

An operational data store (ODS) is a central database that provides a snapshot of the latest data from multiple transactional systems for operational reporting. It enables organizations to combine data in its original format from various sources into a single destination to make it available for business reporting

what is the relationship between statistics and business analytics?

Analytics helps you form hypotheses, while statistics lets you test them

Ratio Data

Continuous data where both differences and ratios are interpretable. The distinguishing feature of a ratio scale is the possession of a non-arbitrary zero value.

what are the commonalities and differences between regression and correlation?

Correlation is a single statistic, or data point, whereas regression is the entire equation with all of the data points that are represented with a line. Correlation shows the relationship between the two variables, while regression allows us to see how one affects the other

why should storytelling be a part of your reporting and data visualization?

Crafting story elements helps define characters, understand the challenge, identify the hurdles, and crystallize the outcome or decision question. It provides meaning and value. This insight helps decision-making and spurns action; therefore, it's the most meaningful. In a world where we are besieged by data but desperate for meaning, data storytelling is a powerful tool to connect the dots and provide insight

identify and discuss the role of middleware tools

Middleware speeds development of distributed applications by simplifying connectivity between applications, application components and back-end data sources. Middleware integration tools connect critical internal and external systems. Integration capabilities like transformation, connectivity, composability, and enterprise messaging, combined with SSO authentication, make it easier for developers to extend capabilities across different applications.

what is a performance measurement system? how does it work?

Performance measurement deals specifically with performance measures. These are the quantitative indicators you put in place to track the progress against your strategy. Typically good performance measures cover a wide variety of criteria, like: -Financial measures -Customer measures -Process measures -People measures

What is OLAP and how does it differ from OLTP?

OLAP = extract data for complex analysis, queries often involve large numbers of records, multi-dimensional schema, can support complex queries of multiple data facts from current and historical data, response times are orders of magnitude slower than OLTP, OLAP systems can be backed up less frequently. OLTP = deal for making simple updates, insertions and deletions in databases, uses a traditional DBMS to accommodate a large volume of real-time transactions, every millisecond counts. Workloads involve simple read and write operations via SQL (structured query language), requiring less time and less storage space, systems modify data frequently.

OLAP

Online analytical processing the manipulation of information to create business intelligence in support of strategic decision making.

what is a performance management system? why do we need one?

Performance Management Systems help to, Employees and Managers to set goals and track progress with shared tracking tools

what are the sources of big data?

The bulk of big data generated comes from three primary sources: social data, machine data and transactional data.

what is a balanced scorecards (BSC)? where did it come from?

The concept of BSCs was first introduced in 1992 by David Norton and Robert Kaplan, who took previous metric performance measures and adapted them to include nonfinancial information. BSCs were originally developed for for-profit companies but were later adapted for use by nonprofits and government agencies

what are the commonalities and differences between linear regression and logistic regression?

They are both parametric Regressions, and both utilize a linear equation to arrive at predictions. However, the similarities end there. In Linear regression the result is continuous. In Logistic Regression, there are only a limited number of possible values

what are the common characteristics of dashboards and other information visuals

They use visual components to highlight, at a glance, the data and exceptions that require action. - They are transparent to the user, meaning that they require minimal training and are extremely easy to use. They communicate information quickly. They display information clearly and efficiently. They show trends and changes in data over time. They are easily customizable. The most important widgets and data components are effectively presented in a limited space.

why do we need data transformation? what are the commonly used data transformation tasks?

Transformed data may be easier for both humans and computers to use. -data discovery -data mapping

what are the key similarities and differences between a two-tiered architecture and a three-tiered architecture

Two tier = client-server architecture. Application logic is either buried inside the user interface on the client or within the database on the server. 2 layers - client tier and database tier. It is easy to build and maintain, runs slower, less secured, performance loss when user levels increase. Three tier = web-based application. Application logic or process resides in the middle-tier, it is separated from the data and the user interface. 3 layers - client layer, business layer and data layer. Complex to build and maintain, runs faster, secured, performance loss whenever the system is run on internet but gives more performance than 2 tier.

Prescriptive Analytics

a branch of business analytics that deals with finding the best possible solution alternative for a given problem

descriptive statistics

a branch of statistical modeling that aims to describe a given sample of data

Big Data

a broad term for datasets so large or complex that traditional data processing applications are inadequate.

Predictive Analytics

a business analytical approach toward forecasting (e.g., demand, problems, opportunities) that is used instead of simply reporting data as they occur

analytics ecosystem

a classification of sectors, technology/solution providers, and industry participants for analytics

database

a collection of organized data that allows access, retrieval, and use of data

Business Intelligence

a conceptual framework for managerial decision support. It combines architecture, databases(or data warehouses), analytics tools, and applications

regression

a data mining method for real-world prediction problems where the predicted values (i.e, the output variable or dependent variable) are numeric (e.g., predicting the temperature for tomorrow as 68F).

Extraction, transformation, and loading (ETL)

a data warehousing process that consists of extracting (reading data from a database), transforming (converting the extracted data from its previous form into the form in which it needs to be so that it can be placed into a data warehouse or simply another database) and loading (putting the data into the data warehouse)

Data Visualization

a graphical, animation, or video presentation of data and the results of data analysis

how does a data warehouse differ from a transactional database?

a transactional database is designed (and optimized) to record, Transactional databases are on-line transaction processing (OLTP) systems where every transaction has to be recorded, and super-fast at that. a data warehouse has to be designed (and optimized) to respond to analysis questions that are critical for your business. A data warehouse (DW) on the other end, is a database designed for facilitating querying and analysis. Often designed as online analytical processing (OLAP) systems, these databases contain read-only data that can be queried and analyzed far more efficiently as compared to your regular OLTP application databases

descriptive (or reporting) analytics

an earlier phase in analytics continuum that deals with describing the data - answering the question of what happened and why did it happen

DSS or decision support systems

couple the intellectual resources of individuals with the capabilities of the computer to improve the quality of decisions. It is a computer-based support system for management decision makers who deal with semistructured problems

what is metadata? explain the importance of metadata?

data about data. in a data warehouse, metadata describes the contents of a data warehouse and the manner of its use

how do you describe the importance of data in analytics? can we think of analytics without data?

data is the raw material for analytic. without data there would be no analytics.

unstructured data

data that do not have a predetermined format and are stored in the form of textual documents

data

refers to a collection of facts usually obtained as the result of experiments, observations, transactions or experiencese

data granularity

requires that the variables and the data values be defined at the lowest level of detail for the intended use of data

What is scalability? How does it apply to DW?

scalability refers to the degree to which a system can adjust to changes in demand without major additional changes or investments. DW scalability issues are the amount of data in the warehouse, how quickly the warehouse is expected to grow, the number of concurrent users, and the complexity of user queries


Kaugnay na mga set ng pag-aaral

NURS20025 Abrams Chapter 31-34, 41 Questions

View Set

Erik Erikson's Theory of Psychosocial

View Set

AOTA Occupational Rehabilitation and Return-to-Work Programming

View Set

Chapter 9: Savings, Interest Rates, and the Market for Loanable Funds

View Set

Chapter 31: PrepU - Nursing Management: Patients With Endocrine Disorders

View Set

Geography Ch1 part 2 Mastering Geography

View Set

Lecture Final Review Micro, Micro HW and DSM 21, Micro DSM 20, Micro HW 20, Micro DSM 19, Micro HW 19, Micro Ch. 18 DSM, Micro HW ch. 17, Micro Ch. 18 HW, Micro HW ch. 16, DSM Ch. 16, 17, Micro Quick Quiz (Ch. 11,, Microbiology Ch. 11-14, Microbiolog...

View Set

ITE 115 Exam 1 study guide (Norman Hahn)

View Set