Part 1: Section F LOS
Describe how an analyst would mine large data sets to reveal patterns and provide insights
1. COLLECT. An organization will collect data and load it into a data warehouse. 2. MANAGED This data will be stored and managed either on in-house servers or the cloud. Data visualization tools use this step to explore the properties of the data to ensure it will help achieve the goals of the business. 3. PREPARE. Gather the business analysts, management teams, and information technology professionals at your organization to access the data and determine the ways they'd like to organize it. What do we want to get? 4. SORT. Application software tools will sort the data based on the results and will use "data modeling" and "mathematical models" to find patterns in the data. 5. PRESENT. Data will be presented in a readable and shareable format, such as a graph or table, created using business intelligence platform, and shared across everyday business operations as a single source of truth.
Recognize potential applications of blockchain, distributed ledger, and smart contracts
A distributed ledger is a database that is housed in several locations or among several participants; all data and transactions are not processed and validated in one central location. The data is typically not stored until consensus is reached by all parties involved. Blockchain is a special kind of database or distributed ledger with individual records or blocks that are linked together in a sequential list called a chain of blocks. These records or blocks are validated by multiple nodes or parties in a peer-to-peer network. The blocks are linked to other blocks, making them immutable or unchangeable. Blockchain technology is the underlying technology of cryptocurrency such as Bitcoin, Ethereum Network, .... Smart contracts are agreements between parties created by software and embed-ded in a Blockchain protocol. Smart contracts are automatically executable software programs that represent an agreement between parties through software code instead of paper. The software code is run and embedded in a blockchain protocol or platform without the need for any type of intermediary Application: - can reduce or eliminate intermediary costs, - reduce time lag for verification and reaching consensus - increase transaction transparency, and - provide a framework for trustworthy decentralization - prevent and detect tax evasion, corruption, unlawful payments, money laundering, and misappropriation of asset. The ability of blockchain technology to exchange data seamlessly through decentralized peer-to-peer networks with all transactions immutably stored and available for audit could make the supply chain system much more TRANSPARENT and RELIABLE. Blockchain technology and smart contracts could reduce or eliminate the "numerous documents" that accompany the international shipment of goods and the "time lag required to process" international financial transactions.
Evaluate where technologies can improve efficiency and effectiveness of processing accounting data and information [e.g., artificial intelligence (AI)]
AI systems can now process the full transaction from electronically capturing the purchase order data to processing the data in the company's ERP, including recording and approval, to sending the purchase order electronically to the vendor. In addition to the obvious benefits of greatly increasing speed and a much lower error rate, using AI systems provides transparency throughout an accounting process, which allows professionals to monitor the process and take advantage of such things as purchase discounts. Intelligent systems can be programmed to identify and interact with customers and vendors; capture, code, and process routine transac-tions such as invoices and purchase orders; track payment deadlines; and ensure the proper approvals are recorded in a timely manner. !!!:Two areas where AI can improve accounting processes are in: 1. data entry and analysis and 2. in reducing fraud
Discuss how EPM can facilitate business planning and performance management
BPM: - Improvement. allows companies to collect data efficiently from various sources, analyze it and use this knowledge to improve the company's performance - Correction. also allows problems to be identified before they have a chance to grow and spread into other areas of the company. - Forecast. it can be used to make more predictable and reliable forecasts. Continuous and real-time reviews of data are used in the BPM process.
Determine the most effective channel to communicate results
Bar Chart - For comparisons Pie Chart = Compares the proportions Histograms = for distributions on LARGE data sets Dot Plot = for distributions on SMALL data sets Box Plot/Box-and-whisker: showing a comparisons among distributions Maps/Geocharts = for locations Scatterplot = Relationship Bubble chart = a version of the scatterplot Heat Map = relationship among variables
Define software as a service (SaaS) and explain its advantages and disadvantages
Software as a Service (SaaS) is a common example of cloud computing. SaaS created a new way to distribute software by allowing customers access to software hosted on the providers' servers. Software as a service (SaaS) is a software distribution model in which a third-party provider hosts applications and makes them available to customers over the Internet SaaS applications for fundamental business technologies, such as email, sales management, customer relationship management (CRM), financial management, human resource management (HRM), billing and collaboration. Leading SaaS providers include Salesforce, Oracle, SAP, Intuit and Microsoft[Office 365]. Advantages: - Cost Effective. Flexible payments and little cost than building the actual software. Avoid recurring costs. - High scalability. SaaS providers can allow different access options to access more/fewer feautures on demand. - Automatic updates. Lower in-houseIT burden - Accessibility and persistence Disadvantages: - Reliance on Outside Vendor - Security Breach - Service Disruptions - Software alterations
Explain how structured, semi-structured, and unstructured data is used by a business enterprise
Structured data is data that is "ORGANIZED" in such a way that computers can easily work with it. In contrast, unstructured data is not organized in such a way that computers can easily work with the data. Semi-structured data lands somewhere between. A common example of structured data is data that has been placed into a relational database. A relational database is comprised of rows and columns, and data is placed in a cell at the intersection of these rows and columns. Unstructured data is data that is "NOT ORGANIZED" so it can be queried. Unstructured data can be text, images, numeric, or audio. Email is a good example of unstructured data. It composes mostly in the big data.
Define Big Data, explain the four Vs: volume, velocity, variety, and veracity, and describe the opportunities and challenges of leveraging insight from this data
Big data is a term that describes the large volume of data - both structured and unstructured - that inundate in a business on a day-to-day basis. The terms "volume," "variety," "velocity," and "veracity" are typically used to describe big data: Volume: refers to the quantity of data [e.x.] Variety: deals with the types of data such as numerical, textual, images, audio, and video Velocity: refers how quickly data can be generated and processed. Veracity: the quality of the data, the accuracy and trustworthy of the data Value: - Cost Reduction - Better Service !: During big-data collection efforts, extra attention on >> SOURCING of data.
Identify the role of the accounting information system (AIS) in the value chain
Purpose: To provide relevant and reliable information to decision makers. Key elements of an AIS is the quality and reli-ability of data and the timeliness of information. IS is a business process to capture, and process data, and to report relevant and reliable informational decision maker. [CPR] Primary Model: IPO Model [Input, Process, Output and Report]
Identify and explain the benefits and limitations of regression analysis and time series analysis
Regression Analysis A: - Linear Regression is easier to implement, interpret and very efficient to train. D: - Assumption of Linearity. Assumed that the cause and effect relationship between the variable remains unchanged. - Cannot be used in case of qualitative phenomenon variables [e.g. crime, honesty] - Prone to noise and overfitting: If the number of observations are lesser than the number of features, Linear Regression should not be used, otherwise it may lead to overfit because is starts considering noise in this scenario while building the model. - Prone to outliers: Linear regression is very sensitive to outliers (anomalies). So, outliers should be analyzed and removed before applying Linear Regression to the dataset. Time-Series Analysis A: D:
Define robotic process automation (RPA) and its benefits
Robotic process automation is the use of specialized computer programs "to automate" and standardize repeatable business processes. 1. Improved Employee Morale = wokers can dedicate more time to engaging and interesting work. 2. Productivity = Process cycles are much faster to manual process. 3. Reliability = Bots tirelessly work 24/7 without interruption 4. Consistency = Routine task are performed the same each and every time 5. Accuracy = Works done with extreme accuracy and uniformity - much less prone to error or typos. 6. Low Technical Barrier. Minimal to Noprogramming skills necessary to configure a bot. 7. Compliance = Provide audit trail of history of actions and changes to data/
Define sensitivity analysis and identify when it would be the appropriate tool to use
-
Demonstrate an understanding of the uses of simulation models, including the Monte Carlo technique
-
Demonstrate an understanding of what-if (or goal-seeking) analysis
-
Identify and explain the limitations of data analytics
-
Identify the benefits and limitations of sensitivity analysis and simulation models
-
Identify the stages of the data life cycle; i.e., data capture, data maintenance, data synthesis, data usage, data analytics, data publication, data archival, and data purging
8 Phases 1. Data Capture 2. Data Maintenance 3. Data Synthesis 4. Data Usage 5. Data Analytics 6. Data Publication 7. Data Archival 8. Data Purging > Data Capture: Data must first be recorded. It can be captured by entering by hand, scanned by computers, or acquired by sensors. > Data Maintenance: In order to be useful, data must be converted to a usable form. The process of creating usable data may include "cleansing, scrubbing, and processing " through an "extract-transform-load" (ETL) methodology. In essence, data cleansing and scrubbing transforms unstructured data into structured data that can be used in an organization's information system. > Data Synthesis: "It use of statistical methods to gain first analysis". It can be defined as the creation of data values via inductive logic, using other data as input. It is the arena of analytics that uses modeling. > Data Usage: the application of data as information to tasks that the enterprise needs to run and manage itself > Data Analytics: is the "science of examining raw data" with the purpose of creating NEW information and "generating business insight" - to find patterns, relationship or what. > Data Publication: In being used, it is possible that our single data value may be sent outside of the enterprise. It is the sending of data to a location outside of the enterprise > Data Archival: is the copying of data to an environment where it is stored in case it is needed again. A data archive is simply a place where data is stored, but where no maintenance, usage, or publication occurs. If necessary the data can be restored to an environment where one or more of these occur > Data Purging: it is the removal of every copy of a data item from the enterprise. It is the sanitization work of data.
Describe the opportunities and challenges of managing data analytics
Companies seek to capitalize on the unique insights gleaned from analyzing the data that has been collected over the years. These insights are used: - to help with customer acquisition and retention, - to identify and correct processes that cost more than they benefit the company, - to reduce costs through identifying inefficiencies in the company's production or operational processes, and others.
Explain how ERP helps overcome the challenges of separate financial and nonfinancial systems, integrating all aspects of an organization's activities
Data maintenance is the primary challenges of separate financial and nonfinancial systems. With the application of the ERP system, both the financial and non financial system of the a business are combined. Data are integrated and synchronized in the system allowing data maintenance in every aspect in the business.
Explain why data mining is an iterative process and both an art and a science
Data mining is not an exact science and can be viewed as both an art and a skill. Consider
Evaluate data visualization options and select the best presentation approach (e.g., histograms, b plots, scatter plots, dot plots, tables, dashboards, bar charts, pie charts, line charts, bubble charts)
Data visualization is one method to convey information to a reader
Define a data warehouse
Data warehouse is a set of large databases consisting of detailed and summa-rized data that is used primarily for ANALYSIS rather than processing transactions. It is essentially a repository or storage location for all of a company's data retrieved from various programs, sources, and databases. The data is typically cleaned and organized so it can be searched
Define enterprise performance management (EPM) [also known as corporate performance management (CPM) or business performance management (BPM)]
Enterprise/Corporate/Business Performance Management[EPM/CPM/BPM] consists of (1) monitoring and (2) evaluating business performance for an enterprise to reach performance goals, enhance efficiency or maximize business processes. Whereas ERP systems help management with the day-to-day operations of a company, EPM is about managing the business through analysis, comprehension, and reporting. !: "Reduced Human Intervention" is the MAJOR benefit of CPM software. Because the CPM software generates all the required reports automatically, no need to prepare the reports manually. Hence, no need to intervene in the functioning of the CPM software.
Describe exploratory data analysis and how it is used to reveal patterns and discover insights
Exploratory data analysis is more of an approach than a set of techniques and tools. Exploratory data analysis uses visual or graphical tools as well as quantitative meth-ods to find patterns in the data, to identify and extract important variables, to find outliers or anomalies included with the data set, to test assumptions and questions about the data, and to gain insight into the data set.
Identify the elements of both simple and multiple regression equations
Linear regression is a tool that help us see the relationship between two variables. The two variables are the dependent variable (or response variable) and the independent variable (or explanatory variable). The standard regression equation is y = mx + b + e, where, y = represents the dependent variable, m = represents the slope of the regression line, x = represents the independent variable, b = represents the y intercept, and e = represents error term.
Explain how to use predictive analytic techniques to draw insights and make recommendations
Predictive analytics: This application, which involves forecasting future opportunities and risks, is the most widely used application of regression analysis in business. For example, predictive analytics might involve demand analysis, which seeks to predict the number of items that consumers will purchase in the future. Using statistical formulas, predictive analytics might predict the number of shoppers who will pass in front of a given billboard and use then use that information to place billboards where they will be the most visible to potential shoppers
Identify and explain the challenges of having separate financial and nonfinancial systems
Primary Challenge: Data Maintenance Financial and nonfinancial systems record, track, and report on the health of the business. The difference is that one uses financial metrics and the other uses nonfinancial metrics. problem with separate financial and nonfinancial systems is one of making sure that the data is linked accurately in both systems. When the two systems are separate, the data should be compared to make sure that the systems are measuring the same thing using different metrics
Explain the role of business process analysis in improving system performance
Processes were not always developed with an eye on how the processes would fit into the overall structure and objectives of the business. Instead they were created based on the easiest way to get the job done with little thought for long-term consequences. After a business process is in place, inertia takes over, and it is difficult to change, which can retard growth. !: Business process analysis is a systematic method to study all of a company's business processes to determine how they can be improved. Four Steps: 1. Identify the Process[Start-to-End]Clearly identify the process, who is involved, and what is currently being done with a clearly defined starting and ending point. 2. Walkthrough. Do a walk-through of the process to document it clearly and fully, look for gaps in the process where information is lost or misdirected. 3. Examine. Examine the current process to identify strong areas and areas that can be improved, such as a. bottlenecks, b. friction points, and c. weaknesses. Look for ways to add value to the process. 4. Propose a plan for improvement.
Explain how query tools [e.g., Structured Query Language (SQL)] are used to retrieve information
SQL (Structured Query Language) is a programming language used to communicate with data stored in a relational database management system. Structured Query Language (SQL) is an established tool used to mine large data sets. There are three basic commands in SQL: 1. SELECT 2. FROM 3. WHERE > SELECT means to select the data fields that are of interest to the user. In other words, what does the user want to see as a result of the query? > FROM identifies the tables where the data is located. > WHERE restricts the data so that it meets certain criteria, such as Where Date < December 31.
SQL v.s. QBE
Structured Query Language: is a "data manipulation language" that is heavily used in relational database management systems. SQL statements are written in the form of SQL scripts without using a graphical user interface. SQL is used to enter, modify, update and query data from the database. Query-by-example, which is a "graphical query language," is also heavily used in relational database management systems. QBE language, which is an intermediate step, is converted into SQL language for final execution of user queries in the background.
Define the systems development life cycle (SDLC), including systems analysis, conceptual design, physical design, implementation and conversion, and operations and maintenance
Systems development life cycle (SDLC) is a structured road map for (1)designing and (2)implementing a NEW information system. Basic 5 Steps: 1. Systems Analysis 2. Conceptual Design 3. Physical Design 4. Implementation and Conversion 5. Operations and Maintenance System Analysis: "identifying" the needs of the organization and assembling the information regarding a. modifying the current system, b. purchasing a new system, and c. developing a new system Conceptual Design: involves "creating" a plan for meeting the needs of the orga-nization. Design alternatives are prepared and detailed specifications are created to provide instruction on how to achieve the desired system Physical design involves taking the conceptual design and creating detailed specifications for creating the system. The design would include specifications for computer code, inputs, outputs, data files and databases, processes and procedures, as well as proper controls. Implementation and conversion involves the "installation" of the new system including hardware and software. The new system is tested and users are trained. New standards, procedures, and controls are instituted. Operations and maintenance involves "running" the system, checking per-formance, making adjustments as necessary, and maintaining the system.
Define business intelligence (BI); i.e., the collection of applications, tools, and best practices that transform data into actionable information in order to make better decisions and optimize performance
Business Intelligence and artificial intelligence is to create computer programs that can MIMIC human insights. It involves capturing data and preserving it to apply actual decision-making setting.s Business intelligence (BI) leverages software and services to transform data into actionable insights that inform an organization's strategic and tactical business decisions. BI offers a way for people to examine data to understand trends and derive insights by streamlining the effort needed to search for, merge and query the data necessary to make sound business decisions The term business intelligence often also refers to a range of tools that provide quick, easy-to-digest access to insights about an organization's current state, based on available data. Business intelligence is descriptive, telling you what's happening now and what happened in the past to get us to that state. Business analytics, on the other hand, is an umbrella term for data analysis techniques that are predictive — that is, they can tell you what's going to happen in the future — and prescriptive — that is, they can tell you what you should be doing to create better outcomes Example of Business Intelligence tools: - Dashboards - Visualizations - Reporting - Data mining - ETL (extract-transfer-load —tools that import data from one data store into another) - OLAP (online analytical processing)
Define enterprise resource planning (ERP) and identify and explain the advantages and disadvantages of ERP
Enterprise resource planning (ERP) is a process used by companies to manage and integrate the important parts of their businesses. An ERP software system can also integrate planning, purchasing inventory, sales, marketing, finance, human resources, and more. ERP systems store all company data in one central database. In addition, the ERP stores nonfinancial data related to all aspects of the company from employee health plans, to equipment maintenance records, to customer phone numbers, to capital expenditure budgets, to marketing campaigns—and everything in between. Advantages: - Optimization/Synchronization of business process - Accurate and Timely access to reliable information - Ability to share information between all components of the organization. - Reduction of time and cost. - Improve of Customer service with reduced response time. Disadvantages: - Installation of the ERP system is costly - Success requires skill and experience of the workforce - System can be difficult to use - Does NOT guarantee success - Requires radical change: if there is resistance to change, effectiveness of the ERP may decrease
Demonstrate a general understanding of data governance frameworks, COSO's Internal Control framework and ISACA's COBIT (Control Objectives for Information and Related Technologies)
Two primary data governance frameworks: 1. By the Committee of Sponsoring Organizations (COSO) 2. By Information Systems Audit and Control Association (ISACA). COSO framework deals with GENERAL data governance; the focus of the ISACA framework is on data governance as it relates to information technology (IT), specifically the Control Objectives for Information and Related Technologies (COBIT) framework. A. COSO: > Frameworks helps users to address internal controls at the operational, financial reporting, and com-pliance levels and by unit or activity > The frameworks are designed to break down the task of designing effective internal controls into five areas of focus: > Five focus areas [17 principles] 1. Control Environment, 2. Risk Assessment, [not Risk response] 3. Control Activities, 4. Information & Communication, and 5. Monitoring B. COBIT > is focused on effective internal control as it relates to IT. The COBIT framework provides best practices for effectively managing controls over IT. It is a voluminous and very detailed set of manuals for creating, implementing, and maintaining IT-related controls > Four parts[32 process] 1. PLan and Organize 2. Acquire and Implement 3. Deliver and Support 4. Monitor and Evaluate.
Explain why data and data science capability are strategic assets
company's data is considered one of its most valuable assets. The ability to tap into that asset and make better decisions represents the strategic value of a company's data and its ability to analyze that data. As the world becomes smarter and smarter, data becomes the key to competitive advantage, meaning a company's ability to compete will increasingly be driven by how well it can leverage data, apply analytics and implement new technologies. Therefore, if companies want to avoid drowning in data, they need to develop a smart strategy that focuses on the data they really need to achieve their goals. To be truly useful in a business sense, data must address a specific business need, help the organisation reach its strategic goals, and generate real value
Discuss the importance of having a documented record retention (or records management) policy
competent record retention policy is necessary for every organization. Records must be kept and maintained for internal use as long as they are needed by users to research, analyze and document past events and decisions. In addition, records must be preserved to meet legal and regulatory requirements.
Describe the challenges of data mining
- Incomplete and Noisy Data. The real-world data is heterogeneous, incomplete and noisy. Data in large quantities normally will be inaccurate or unreliable. Even some customers might not be ready to disclose their email id which results in incomplete data. The data even could get altered due to system or human errors. All these result in noisy and incomplete data which makes the data mining really challenging. Noisy data refers to large yet meaningless data. - Distributed Data Real world data is usually stored on different platforms in distributed computing environments. It could be in databases, individual systems, or even on the Internet. It is practically very difficult to bring all the data to a centralized data repository mainly due to organizational and technical reasons - Complex Data Real world data is really heterogeneous and it could be multimedia data including images, audio and video, complex data, temporal data, spatial data, time series, natural language text and so on. It is really difficult to handle these different kinds of data and extract required information - Performance Performance of the data mining system mainly depends on the efficiency of algorithms and techniques used. If the algorithms and techniques designed are not up to the mark, then it will affect the performance of the data mining process adversely. - Data Privacy and Security Data mining normally leads to serious issues in terms of data security, privacy and governance. For example, when a retailer analyzes the purchase details, it reveals information about buying habits and preferences of customers without their permission.
Different cycles
1. Revenue typical revenue or sales process starts when a company receives a purchase order from a customer. Any credit purchases are approved by the (credit manager). Receive PO > Check Customer Credit > Check Inventory Availability > Prepare Sale Order > Prepare Inventory Ticket > Prepare Shipping Document[Packing Slip & BoL] > Pack and Ship > Send Invoice to Customer, Record Sales & A/R > Receive Payment(Clerk) > Record Receipt. 2. Expenditure Create Purchase Requisition(Requesting Dept.) > (Purch. Dept.) Create "Purchase Order" and Sent to Vendor > Vendor Ships Items > (Receiving Dept.) Counts, Inpects, and Prepares "Receiving Report" > Transfer to Inventory [Inventory Control Records Receipt] > (Account Payable Dept.) Receive PO, RR and "Invoice from Vendor", > Create Voucher Package [PO + RR + Invoice] > A/P is Updated > Cash Disbursement prepares payment > Check is signed by authorization and sent > A/P is updated. Proof of order = Purchase Order Proof of receipt = Receing Report Proof of billing = Vendor's Invoice 3. Production There are 4 major activities in the operations cycle: Product design, Planning and scheduling, Production Operations, and Cost Management. 4. Human Resource Function: to compensate employees for the work they have performed and to take care of taxes related to payroll Main Steps: Update Master Payroll Data > Recrd Time Data > Prepare Payroll > Disburse Payroll > Disburse Taxes and other Deductions 5. Financing 6. Fixed Assets Acquire > Maintain and Depreciate > Disposal 7. General Ledger Update General Ledger > Post Adjusting Entries > Prepare Finacial Statements > Produce mangerial reports
Define the following analytic models: clustering, classification, and regression; determine when each would be the appropriate tool to use
3 Data Analytical Models: 1. Clustering 2. Classification 3. Regeression > Clustering:seeks to find similar groups of data points within a data set. Clustering is most commonly used in market research where a company wants to better understand the "preferences" of various groups of customers. > Classification: seeks to group data points into classifications or categories. The primary difference between classification and clustering is that classification has set and predefined classifications or categories and seeks to place the data points into those categories. In contrast, clustering seeks to find those categories by analyzing the data. Classification may be used to classify customers by "demographic characteristics" and their "propensity to favor certain products". > Regression: Regression seeks to predict a number based on a model or equation. The data is continuous or numerical in nature
Define relational database and demonstrate an understanding of a database management system
> Database is a set of data stored in a computer. > Relational database is a type of database. It uses a structure that allows us to identify and access data in relation to another piece of data in the database. Often, data in a relational database is organized into tables: rows and column. Relational database management system (RDBMS) is a program that allows you to (1)create, (2)update, and (3)administer a relational database. Most relational database management systems use the SQL language to access the database Database management system (DBMS) is the interface or program between the database and the application programs[e.g. Sales Application, Shipping Application, Inventory Control Applicaiton] that access the database. DBMS facilitates creating, retrieving, updating, managing data, and protecting data. Two primary components: Data and Database Program > Database schema or blueprint which defines the database logical structure, or the way humans view the data. This allows users to access the database without knowing where the data is actually physically located.
Define standard error of the estimate, goodness of fit, and confidence interval
> Standard Error of Estimate is the "measure of variation" of an observation made around the computed regression line. Higher standard of error means the a lot samples deviate from the mean. > Goodness of Fit of a statistical model describes how well it fits a set of observations. It measures HOW closely model MATCHES or FITS the observations or data points in the sample. It helps understand the QUALITYT of REGRESSION EQUATION. > Confidence Interval. measure the degree of uncertainty or certainty in a sampling method. A confidence interval can take any number of probabilities, with the most common being a 95% or 99% confidence level. Confidence interval is a range of values that likely would contain an unknown population parameter. Confidence level refers to the percentage of probability, or certainty, that the confidence interval would contain the true population parameter when you draw a random sample many times. Or, in the vernacular, "We are 99% certain (confidence level) that most of these datasets (confidence intervals) contain the true population parameter."
Define cloud computing and describe how it can improve efficiency
Cloud computing is simply outsourcing computer operations to an outside provider. Companies effectively outsource their hardware capital expenditures and their computing obsolescence risk to these pro-viders. They also can derive the tax benefit of directly expensing these costs instead of capitalizing, maintaining, and depreciating the cost of hardware and software over the useful life of the asset. Cloud computing comes with some notable risks, such as the risk that the cloud provider will not provide sufficient security for data and it may be lost, stolen, or corrupted. Another risk is that the cloud provider may not provide the level of service required to support the operations of the company. 3 main types of cloud computing: [from more third-party reliance to least third-party reliance.] 1. SaaS [Software as a Service]: software that is available via third-party over the internet. Examples: Dropbox, Gmail, Facebook 2. PaaS [Platform as a Service]: hardware and software available over the internet. Windows Azure 3. IaaS [Infastructure as a Service]: cloud-based services[e.g. storage, networking] Amazon web Services It can be viewed like a pizza service: 1. SaaS = Dined Out [you prepare nothing] 2. PaaS = Pizza Delivery [you only need the table(application) and the third party will prepare the rest] 3. IaaS= Take and Bake [you need the table and equipment to bake, and the third party will provide the ingredients] 4. On-Premise = Made at Home
Demonstrate an understanding of the coefficient of determination (R squared) and the correlation coefficient (R)
Correlation coefficient (R) measures how two variables "move together". If they move perfectly together, R = 1 (positive correlation). If they move in the absolute opposite direction, R = -1 (negative correlation). If they do not move together at all, R = 0 (no correlation). Coefficient of determination(R-Squared) tells how well the model performed. Essentially, r2 explains how much of the variation in the data is explained by the model or rather, how well the model (equation) fits the data. The higher coefficient is an indicator of a better "goodness of fit for the observations" R-squared is always between 0 and 100%: > 0% indicates that the model explains none of the variability of the response data around its mean. > 100% indicates that the model explains all the variability of the response data around its mean.
Identify and explain controls and tools to detect and thwart cyberattacks, such as penetration and vulnerability testing, biometrics, advanced firewalls, and access controls
Cyberattack Detection and Prevention: 1. Penetration testing - /pentest/ethical hacking, is an authorized simulated cyberattack on a computer system, performed to evaluate the security of the system.. 2. Biometric Identification - Each information system user should authenticate themselves to the system. Three primary methods of authentication namely 1) Password, 2) Identification card, and 3) Fingerprint 3. Firewalls - to establish a barrier between your internal network and incoming traffic from external sources (such as the internet) in order to block malicious traffic like viruses and hackers. Methods include: a. Packet filtering examines the "source address" of a packet sent to a company's network and compares it to a list of sources of malicious packets. If the address is on the list, it does not enter the company's information system. b. Deep packet inspection "scans the contents" of packets for malicious code before allowing the packet to enter the company's information system. 5. Vulnerability Testing - is a process that detects and classifies security loopholes (vulnerabilities) in the infrastructure 6. Access control is the selective restriction of access to a place or other resource
Define the different types of data analytics, including descriptive, diagnostic, predictive, and prescriptive
Data Analytics is a function of Value and Complexity. Data analytics is the process of extracting, transforming, loading, modelling, and drawing conclusions from data to make decisions. 4 types of Data Analytics(from least to most valueable and complex) [! clue alphabetical] 1. Descriptive: What happened? 2. Diagnostic: Why did it happened? 3. Predictive : What will happen? 4. Prescriptive: How can we make it happen? Hindsight/Past Oriented: Descriptive and Diagnositic Foresight/Future Oriented: Predictive and Prescriptive Descriptive: presents what happened. It is designed to get you basic expository information: who, what, when, where, how many? Diagnostic: help you answer the question of why something happened. Queries and drill-downs enable you to request more detail related to a report, which can help explain surprising values Predictive:help you identify trends in relationships between variables, determine the strength of their correlation, and hypothesise causality Prescriptive: analytics is where artificial intelligence and big data come into play. Whereas statistical modelling is more about assessing correlation to prove or disprove a hypothesis, machine learning is about predicting outcomes based on numerous variables. Big data, staggeringly large sets of information often reflecting crowd behaviour and sourced from outside the company in question, is essential to machine learning because it is complex enough to refine the artificial intelligence's decisions over time
Demonstrate an understanding of time series analyses, including trend, cyclical, seasonal, and irregular patterns
Data can be analyzed either as a cross-sectional data set or as a time series data set. > Cross-sectional data comes from the observation of many entities at "one point in time". A good example of cross-sectional data can be the stock returns earned by shareholders of Microsoft, IBM, and Samsung as for the year ended 31st December 2018. > Time series data set since the data is collected over a period of time instead of at a single point in time. A good example of time-series data could be the daily or weekly closing price of a stock recorded over a period spanning 13 weeks. 1. Secular trend - is that component of the time series which gives the general tendency of the data for a long period. It is smooth, regular and long-term movement of a series(the steady growth.) Example: Growth of a population, 2. Seasonal variation- It shows different trend in different seasons. This variation is periodic in nature and regular in character. 3. Cyclical fluctuations - this type of fluctuation which usually lasts for more than a year. This fluctuation is the effect of business cycles. In every business there are four important phases- i) prosperity, ii) decline, iii) depression, and v) improvement or regain. 4. Irregular variations- These are, as the name suggests, totally unpredictable. The effects due to flood, draughts, famines, earthquakes, etc are known as irregular variations. All variations excluding trend, seasonal and cyclical variations are irregular
Define data governance; i.e., managing the availability, usability, integrity, and security of data
Data governance is comprised of the overall management of data within an organization. Without a well-designed and functioning data governance program, data can be corrupted, devalued, rendered unusable, lost, or even stolen. A data governance plan should include an oversight body, a set of procedures and con-trols, and a set of policies or directives to implement the procedures and controls > Data stewards should be given primary responsibility over their data's availability, usability, integrity, and security. > Data governance is concerned primarily with managing the: - Availability[proper fault tolerance and redundancies built into information systems, uninterruptible power supplies and backup generators, backup and tested backup procedures, and real-time mirroring], - Usability, - Integrity[proper segregation of duties, data change management and authorization structures, and independent checks and audits], and - Security of Data [e.g. access control matrix, firewalls and other network security tools, data encryption, and patch management, AND Input, Processing, and Output controls] Input, processing, and output controls aid data stewards in maintaining data quality. a. Input controls include data entry con-trols, such as proper data input screen or form design, field checks, limit checks, completeness checks, validity checks, and batch totals b. Processing controls include data matching, proper file labels, cross-footing balance tests, and concurrent update controls. c. Output controls include user review of output, reconciliations, and data transmission controls
Define data mining
Data mining is the process of extracting information from large volumes of data. Data mining consists of using analytic tools on large data sets. In essence, data mining involves querying a lot of data. Data mining is the process of finding anomalies, outliers, trends, patterns and correlations within large data sets to predict outcomes. Using a broad range of techniques, you can use this information to increase revenues, cut costs, improve customer relationships, reduce risks and more
Describe the progression of data, from data to information to knowledge to insight to action
Data → Information → Knowledge → Insight → Action > Data by itself is not very helpful; to be useful, it must be organized into information. > Organized data creates meaning that can be used as information. > Knowledge can be thought of as when a person obtains information and competencies by education and experience. In other words, a person takes information and uses his or her own internal processes to convert it to knowledge > In insight, people delve into their knowledge bases to make connections or see patterns that are not readily apparent. Insight can also be thought of as judgment > The true value of data, information, knowledge, and insight is to make informed and rational decisions or to take action
