MIS Chapters 5 and 6
Define the role of a data scientist.
- Combines the skills of being a strong business acumen, a deep understanding of analytics, and a healthy appreciation of data, tools, and technique limitations. - Delivers real improvements in decision making - Highly inquisitive person - Educational requirements are quite rigorous - Job outlook is extremely bright
State the purpose of data normalization.
Data is organized into relations Rows are entities and Attributes are columns Rows have a unique primary key Primary and foreign keys enable table relationships User queries perform operations on the database
State four ways a NoSQL database differs from an SQL database.
NoSQL databases differ from relational databases because: - They have data modeled without 2D tabular relations - Use horizontal scaling - Does not require predefined schema (can change things) - Does not conform to ACID properties when processing transactions
Identify two key benefits of enforcing the ACID properties on SQL databases.
The ACID properties are atomicity, consistency, isolation, and durability Guarantees the database transactions are processed reliably, ensures the integrity of data in the database
Explain the purpose of each step in the extract, transform, and load process.
The ETL process extracts data from a variety of sources, edits and transforms data into data warehouse format, and loads the data into the data warehouse.
Chapter 6: Identify five key characteristics associated with big data.
Volume Velocity Value Variety Veracity
Define the role of a database administrator.
A skilled and trained IS professional that holds discussions with business users to define their data needs, apply programming languages to create databases to meet those needs, tests and evaluates the databases, monitors databases for improvements, and assures data is secure.
Define the components of the data hierarchy including attribute, entity, record, file, and database.
Attribute: a characteristic of an entity, it is a column in a table Entity: person place or thing for which data is collected, stored, and maintained Record: a collection of attributes of an entity, it is a row in a table because a row covers multiple columns File: Collection of entities Database: a well-designed, organized, and carefully managed collection of data
Identify six benefits gained through use of high-quality data.
Improves decision making improves innovation increases customer satisfaction increases sales raises productivity ensures compliance The 9 characteristics of high-quality data are accessible, accurate, complete, economical, relevant, reliable, secure, timely, and verifiable
Identify the primary advantage of in-memory database in processing big data.
In-memory databases store the entire database in RAM which allows faster access to the data compared to secondary storage. They also enable the analysis of big data and other challenging data-processing applications.
Identify the two primary components of the Hadoop computing environment.
1. A data processing component (called MapReduce) 2. A distributed file system (called the Hadoop Distributed File System (HDFS) which is used for data storage by dividing the data into subsets and distributes the subsets onto different servers for processing)
Identify three factors driving the need for data management.
1. The need to meet external regulations designed to manage risk associated with the misstatement of financial data 2. The need to avoid the accidental release of sensitive data 3. The need to ensure that key business decisions are made using high-quality data
Identify five key challenges associated with big data.
1. how to choose what subset of data to store 2. where and how to store the data 3. how to find the data that is relevant to the decision at hand 4. How to derive value from the relevant data 5. how to ID which data should be protected from unauthorized access. Also, increased risk that organizations will fail to comply with gov regulations. Also privacy and security concerns.
Define the term database management system.
(DBMS) A group of programs used to access and manage a database and provide an interface between the database, its users, and other application programs
Identify four key responsibilities of the data governance team.
1. specifies who is responsible for various aspects of the data, including accuracy, accessibility, consistency, completeness, updating, and archiving 2. defines processes for how data should be stored, archived, backed up, and protected from cyberattacks. 3. Develops standards and procedures that define who is authorized to update, access, and use the data 4. Puts in place a set of controls and audit procedures to ensure compliance with data policies and government regulations
Identify seven key questions that must be answered when designing a database.
1. Content: what should be collected and at what cost? 2. Access: What data should be to which users and when? 3. Logical Structures: How should data be arranged so that it makes sense to the user? 4. Physical Organization: Where should data be physically located? 5. Response time: How quickly must the data be updated and retrieved so it can be viewed by users? 6. Archiving: How long should this data be stored? 7. Security: How can this data be protected from unauthorized access
Identify six fundamental characteristics of the relational database model.
1. Data is organized into collections of 2D tables called relations 2. Each row represents an entity and each column represents an attribute of that entity. 3. Each row in a table has a unique identifier called a primary key 4. The type of data a table column can contain can be specified by an integer, decimal, date, text, etc. 5. The data in a table column can be constrained to be a certain type, a certain length, or have a value between two limits 6. Primary and foreign keys allow relationships between tables 7. user queries allow the user to ad, change, or delete data
Identify five broad categories of business intelligence/analytics techniques including the specific techniques used in each.
1. Descriptive Analysis - Specific Techniques: visual and regression analytics 2. Predictive Analysis - Specific Techniques: time series analysis and data mining 3. Optimization - Specific Techniques: genetic algorithm and linear programming 4. simulation - Specific Techniques: scenario analysis and monte carlo simulation 5. Text and video analysis - Specific Techniques: text and video analysis
Identify two advantages associated with database as a service (DaaS).
1. Eliminates installation, maintenance, and monitoring of databases which reduces hardware, software, and staffing costs 2. The service provider can allocate more or less database storage based on customer's changing needs
Identify four potential issues that arise with the use of self-service analytics.
1. If not well-managed, increased risk of incorrect analysis and reporting which could damage decisions 2. different analyses can have different results which wastes time trying to explain the differences and can double time and money spent on analysis 3. can lead to over spending on unapproved data sources and business analytics tools 4. can make problems worse by eliminating the checks and balances on data prep and use. so, leads to less strong data governance
Identify six functions performed by a database management system.
1. Provides a user view: provides access to a database 2. Create and modify the database 3. Store and retrieve data: act as an interface between an application program and the database 4. Manipulate data and generate reports 5. security management 6. Backup and recovery
State the primary difference between business intelligence and analytics.
Business Intelligence is a wide range of applications, practices, and tech. They extract, transform, integrate, visualize, analyze, interpret, and present data. Also supports improved decision making. Analytics is the extensive use of data and qualitative analysis. It supports fact-based decision making.
Distinguish between data management and data governance.
Data Management: An integrated set of functions that define the processes by which data is obtained, certified, stored, secured, and processed. Ensures data accessibility, reliability, and timeliness meets users needs Data Governance: defines roles, responsibilities, and processes. Ensures data can be trusted by the whole organization, and ensures that people are in place to fix and prevent issues with data.
Distinguish between the terms data warehouse, data mart, and data lake.
Data Warehouse: large database that holds business information from many sources in the enterprise. Covers all aspects of the company's processes, producers, and consumers Data Mart: A subset of a data warehouse used by small and medium sized departments within an enterprise. Supports decision making. Data Lake: takes a store everything approach to big data. Stores everything in its raw and unaltered form. Data marts and data warehouses both support decision making and allow companies to access OLTP data (Online transaction processing systems), which are used to capture data, but do not support data analysis required today.
Distinguish data from information and knowledge.
Data is raw facts. 4 types of data are alphanumeric, video, audio, and image data. Information is a collection of organized data that gives context to data like saying revenue in 2020 was blank, and revenue in 2021 was blank. Knowledge is an understanding of information and using it to make decisions
Define the roles of the database schema, data definition language, and data manipulation language.
Database Schema: a description that describes the database's logical and physical structure. It Identifies tables and the attributes in the tables. Also Identifies the relationship between tables and attributes. Its like the teacher of the DBMS, the DBMS can ask the schema questions Data Definition Language: A collection of instructions and commands that define and describe data in a database Data Manipulation Language: A specific language provided with a DBMS that allows users to modify the data, make queries, and generate reports.
Define the term data cleansing.
Detects and then deletes incorrect, incomplete, inaccurate, or irrelevant records in the database. Improves the quality of data used in decision making. different from data validation, which is when a user enters incorrect data and the data is rejected at the time of entry (like entering a wrong password)
Identify three key organizational components that must be in place for an organization to get real value from its BI/analytics efforts.
Existence of a solid data management program (which includes governance) Creative data scientists Strong commitment to data-driven decision-making