Part 1: Section F
Communication practices
- Form follow the function: this means that "knowing what you need to show[function] will help determine how you want to show it[form]" - Substance over form - Quality over quantity Most effective = "Medium matches message" Many types of media exist to send and receive a message, including face-to-face communications, telephone calls, regular meetings and electronic meetings (videoconferencing), memos, letters, and reports. The goal is to match the media richness with the message situation. Otherwise, mismatching occurs, which can lead to confusion and embarrassment. Here, the medium is "channel" and the message is "results. Reporting a bad news[layoff employees] is better in face-to-face than having it on email/knowing it from a third-party[newspaper]
Record Retention
- To ensure data security Concerns of Data Retention policy - The ability to retrieve data records - The ability to located data records - The ability to meet laws and regulation [it NOT its concern to PROCESS data records] Purpose: - make sure that an organization complies with applicable legal and regulatory requirements [meet laws and regulations] - can also safeguard the organization against the loss of key strategic information
Limitation of Data Analytics Visualization
- Visualization can overly simply analysis results and lead to reliance on the visual, as opposed to the true meaning of the output.
Foresight v.s. Insight v.s. Hindsight
Hindsight means PAST showing the historical results through various reports. Insights means PRESENT showing of the current results based on action triggers. Foresight means FUTURE compliance awareness, strategic planning and operations planning.
Different data visualization
Histograms = shows distribution of numerical data Boxplots = shows the distribution of data by displaying the QUARTILE in which data occur. 5 number summary [Minimum, lower quartile, median, higher quartile, and maximum] Scatterplots = shows how two variables are "related" Dot plots = are similar to "histogram" and are used when "small" values fall into discrete bins/categories. Tables = list information in rows and column Dashboards = are interactive, quick, comprehensive summary view of key performance indicators Bar Charts = are used with categorical data to show the proportion of data in each category with "horizontal[numerous data]/vertical bars[small data]" Pie charts = are used with categorical data to show the proportion of data in each category with slices in a circle. Line charts = are used to show a series of data points for one variable. Bubble chart = are an enhancement of "scatter" chart wherein an additional dimension of the data is shown by the size of the circle.
Bitcoin and Blockchain & Hash chain
Involved technologies: Blockchain and Hash chain Blockchain is a fully distributed and decentralized ledger (similar to a stock ledger) that is synchronized through consensus between entities or parties. It facilitates fully decentralized operations, processes peer-to-peer transactions, and is tamper evident and resistant. Blockchain uses a cryptographic hash function (e.g., secure hash algorithm) and digital signature algorithm in creating a continuous list of records (blocks) that are linked and secured. Hash functions are computational functions that take a variable-length data input and produce a fixed-length result (output) that can be used as evidence representing the original data
Blockchains
It distributed, digital ledger of economic transactions. It is distributed in that it is a decentralized, public database. The ledger keeps track of all transactions within a peer-to-peer network. The digital record of transactions is widely distributed, but cannot be copied. The digital record cannot be altered once recorded and therefore requires new transactions to update or change information. Block chain has been used to develop a new type of contractual agreement called smart contracts. Smart contracts based on blockchain technology allow for contractual terms to be completed without involving third parties. That is, agreed-upon terms can be verified and carried out using a distributed digital ledger whereby the transactions are trackable and unalterable and are executed and enforced based on computerized protocols Types: 1. Public 2. Private 3. 4. Consortium
Data Analysis
1. Exploratory data analysis provides a preliminary (initial) analysis of data to gain insights about data patterns and relationships. A standard frequency distribution of classes or categories is used in the exploratory data analysis. Box plots and stem-and-leaf diagrams are used in exploratory data analysis work. 2. Confirmatory data analysis provides the final statistical inferences and conclusions about data patterns and relationships based on the results from exploratory data analysis. Exploratory data analysis is done first and confirmatory data analysis is done next. 3. Continuous data analysis deals with "fractional" values (i.e., not whole numbers) that are measured. 4. Discrete data analysis deals with "integer" values (i.e., whole numbers) that are counted.
Prescriptive Analytics
Prescriptive analytics gives the answer to "what must we do to get a specific desired output?" This allows an organization to alter its behavior to achieve desired goals.
Compensating Controls
A Compensating control is deployed to augment or enhance existing controls. Example: The addition of encrypting data in transit (i.e. HTTPS) that was PREVIOUSLY encrypted at rest is a form of what type of control.
Data dictionary or ERD (Entity Relationship Diagram)
A storage commonly used for critical data such as system relationships, data types, formats, and sources. Database scheme is the collection of data definitions that specify a database.
Zero-day Warez [negative day]
A zero-day warez (negative day) refers to software, games, videos, music, or data unlawfully released or obtained on the day of public release. Either a hacker or an employee of the releasing company is involved in copying on the day of the official release.
Cloud Computing
Cloud computing is a NETWORK OF REMOTE SERVERS that are connected by the Internet. The remote servers are used to store, manage, and process data. It represents a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. Cloud computing environment deals with: - security DOWNSIDE issues, and Example: > System complexity - security UPSIDE issues Example: > staff specialization: > platform strength > back-up and recovery Benefits - Rapid elasticity: can automatically increase their resources - On-demandable self-service - Resource pooling
Data Warehouse
- Data contained in a data warehouse is "redundant" [simply, DW is nonnormalized] - DW will contain "multiple" databases from different sources [rather than single large databases] - DW is a culmination of data from "multiple" sources - Data in DW is "nonvolatile and stored as read-only" - Data exist in DW are more on historical.
Virtual Private Network (VPN)
Companies can establish direct private network links among themselves or create private, secure Internet access, in effect a "private tunnel" within the Internet. It is a network that is established by an organization to allow certain employees to access organizational information in a secure environment even if they are not physically on-site.
Data Analytics
Data analytics is a subset of business intelligence. It can be descriptive, diagnostic, predictive, and prescriptive. More importantly, it is about gaining insights on the data. Data analytics processes data into information by organizing data and using analysis techniques to identify and understand relationships, patterns, trends, and causes. Data analytics can help individuals develop and refine information into knowledge, which requires human understanding. Increased understanding may lead to insight or judgment as individuals recognize connections that are not readily apparent. Data >> Information >> Knowledge >> Insights
Radar Chart
Displays difference between actual and expected performance. Show comparative rankings of several related measures. The radar chart is a visual method to show in graphic form the size of gaps in a number of areas such as current performance versus ideal (expected) performance and current budget versus previous budget.
Exploratory Data Analysis (EDA)
EDA is an approach involving the use of various visualization techniques to let the data reveal interesting information about itself.
ETL
ETL is the common process that occur periodically to UPDATE a data warehouse - extract from the source, transform into the proper input format, and then load the transformed data into the data warehouse.
AIS
accounting information system (AIS) is a formalized process to collect, store, and process accounting information. AIS captures the pertinent information and recordkeeping needed in order to produce financial statements AND performance reports[budgets,...] AIS captures the pertinent information and recordkeeping needed in order to produce financial statements. A primary function of an AIS is to report information. Information that is ACCURATE and TIMELY. Collect>Store>Process This is one of the key advantages of an AIS and its influence on the value chain of its customers, suppliers, and distributors.
Data Warehouse
it is a place where databases are stored[storage of data] so that they are available when needed. It used to aggregate data from multiple sources into a CENTRAL integrated data repository. Data warehouses are used to analyze business activities and operations.
Big Data
refers to data sets that are too large and complex for traditional database applications, including structured and unstructured data. Data is used to gain insight into relationships and associations to improve decision making and strategy. 4 Dimensions: 1. Volume: refers to the SCALE of the data. 2. Velocity: refers to the SPEED by which data is generated and analyzed. 3. Variety: Big data comes in many different forms. It can refer to traditional relational databases, videos, tweets, emails, and more. 4. Veracity: refers to the truthfulness or accuracy of data. !:VARIETY is the MAJOR technical driver of INVESTMENT in big data because more variety means more insights, more decisions, and more opportunities !: SOURCING of data = requires EXTRA attention.
Robotic Process Automation
the use of software to complete routine, repetitive tasks. RPA is typically used in settings with high volumes of routinized actions. Robots can be employed to manipulate data, record transactions, process information, and perform many other business and IT processes. Drawbacks: - If rules or processes change, then the RPA system requires updating - initial investment to develop a rule-based system that automates workplace tasks can be costly [significant cost and time to understand, DEFINE, test, verify and audit the automated process]
Types of Analytical Model
1. Clustering: data analysis technique that involves grouping similar objects together. Clustering is focused on the discovery and identification of patterns of similarity and dissimilarity. Clustering is an exploratory technique that provides insight into common properties or characteristics. It typically does not start with predefined groupings or categories. 2. Classification: a data analysis technique which attempts to predict which category or class an item belongs to. Classification typically begins with predefined categories and then attempts to sort an item into one of those categories. 3. Regression Equation. a data analysis method of analyzing the correlation of an outcome (dependent variable) with explanatory or independent variables 4. Time Series. data analysis method that considers data points over time. Time-series analysis can identify patterns in observed data. Trends provide useful information to help predict future outcomes based on what has happened in the past. Common trends include: > systematic trends (such as prolonged upward or downward movements), > cyclical trends (such as macroeconomic cycles that rise and fall), > seasonal trends (such as periodic spikes in retail shopping around holidays), and > irregular trends (such as erratic fluctuations due to unforeseen events like natural disasters). 5. Exploratory Data Analysis. is used to summarize the characteristics of a dataset. This technique often involves using visual methods to examine the data for patterns or anomalies. It Is used to help identify and discover what hypotheses to test. Exploratory data analysis is the initial step that is used to inform future data collections and statistical tests. 6. Sensitivity Analysis - is the exploration of how a dependent variable might be affected given different possible sets of independent variables 7. Simulation Model - are based on computational algorithms that follow specified assumptions to model possible outcomes. They are based on random sampling done repeatedly in order to form probability distributions for the outcome of interest.
Life Cycle of Data
1. Data Capture: data is generate/recorded through [data entry, data acquisition, signal reception/IoT] 2. Data Maintenance: the cleaning, and scrubbing of data, done in an Extract-Transform-Load procedure. It is done to ensure data's usability and integrity. 3. Data Synthesis: creates new data by using existing data and deriving logic outputs. It uses inductive logic to derive information from data. Data synthesis involves combining experience, judgments, or opinions to provide logical interpretations of existing data points. 4. Data Usage: simply employing data within the business activities and processes of an organization. 5. Data Analytics: specialized stage wherein data is used in models to facilitate pattern recognitions and evaluation of correlations and associations. 6. Data Publication. involves making data available outside of an organization[can be intentional/accidental] 7. Data Archiving: data is set aside from active usage. Data that is archived is stored for future use. 8. Data Purging : final stage for data and involves data being deleted or removed from existence. Data purging is meant to be permanent erasure or removal of data and its copies.
Challenges of Data Mining
1. Data quality - if data has missing values, errors or insufficient sample size, it can limit the ability to draw inferences from it. Sometime data can be bias. 2. Data quantity - data mining can involve enormous datasets. Analyzing large datasets can require computational power beyond the capabilities of standard computer and these softwares and hardwares can be pretty expensive. 3. Data variety - data are not mostly structured in the same way. This will be difficult as it needs to transform data into a common format to use for better analysis. 4. Data insights = data mining techniques can produce large quantities of output. It requires specialized knowledge about the data context to properly draw and interpret data. KEY challenge: Distributed Data - Real-world data is usually stored on different platforms in distributed computing environments. It could be in databases, individual systems, or even on the Internet. It is practically very difficult to bring all the data to a centralized data repository mainly due to organizational and technical reasons. Ideal Use-case for analyst to mine large data sets: - To detect suspicious financial activities with a high potential risk.
Types of Data Analytics
1. Descriptive - [what happened?] Descriptive data analysis is observational and reports the characteristics of historical data. It describes statistical properties such as the mean, median, range, or standard deviation. 2. Diagnostic [why it happened?] Diagnostic data analysis looks at correlations, the size and strength of statistical associations. It can help identify empirical relationships that may be unknown or uncertain and data is explored to find meaningful statistical associations 3. Predictive [what will happen?] Predictive analysis builds upon descriptive and diagnostic analytics to make predictions about future events. It can take the form of what-if analysis where sets of possible FACTS, their LIKELIHOODS, and RANGES are used to formulate potential future outcomes. 4. Prescriptive [what SHOULD happen]. Prescriptive analytics draws upon the other forms of data analytics to infer or recommend the best course of action. Prescriptive analysis can take the form of optimization or simulation analyses to identify and prescribe the actions to undertake to realize the best or most desired result.
Characteristics of a Cloud Computing
1. Measured service: Resources are measured and operated on a "pay-per-use basis" or a charge-per-use basis. Resource usage can be monitored, controlled, and reported, thus providing transparency for both the user and provider of the utilized service. 2. Resource pooling: cloud provider's and vendor's computing resources are pooled to serve multiple users and consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to user-consumer demand. Examples of pooled resources include storage, processing, computer memory, and network bandwidth. 3. Rapid-elasticity: Rapid elasticity means the cloud provider's capabilities and resources can be expanded or restricted based on user-consumer demand for such capabilities and resources. 4. On-demand self-service: A user-consumer can unilaterally provision computing capabilities, such as server time and network storage amount, as needed automatically without requiring human interaction with each service provider
ISACA COBIT 5 Principlies
1. Meeting Stakeholders Needs[enterprises exist to create value] 2. Covering the Enterprise End-to-End 3. Applying a single Integrated Framework 4. Enabling a Holistic Approach 5. Separating Governance From Management
Example of how to turn raw data into actionable information
1. Pivot Table: A pivot table is a second, revised table in rows and columns containing reformatted data using the raw data from the first, original table in rows and columns. A pivot table is also called a pivot chart. The pivot table contains sorted, rearranged, and summarized data, providing better insights. 2. Contingency Table: A contingency table is a type of table presented in a matrix format displaying frequency distribution, showing their probabilities. Contingency tables (cross-tabulations) are used in business intelligence, market research, and customer surveys where interrelations and interactions between two or more variables can be studied to obtain greater insights of data.
Types of Blockchain
1. Private: A private blockchain limits access to users authorized by the owning organization, such as within an organization Example: manufacturing process of automobiles from parts receipt to the rollout of final products 2. Public: A public blockchain provides easy access to publicly available data from a variety of sources. Confidential data could be placed on the blockchain after being encrypted to protect confidentiality. Example implementation: consumer app to track packages from a variety of producers, shipped using different vendors. 3. Consortium: is best suited for multiple organizations that work together as business partners and need to share some information
AIS Cycles:
1. Revenue Cycle:process of taking orders, shipping products or delivering services, billing customers, and collecting cash from sales. 2. Purchasing Cycle: process of placing orders, receiving shipment of products or delivery of services, approving invoices, and making cash payments. 3. HR and Payroll Cycle: process of recruiting, interviewing, and hiring personnel, paying employees for their work, promoting employees, and finalizing employees' status from retirements, firings, or voluntary terminations. 4. Production Cycle: process by which raw materials are converted into finished goods. 5. PPE cycle: process of acquiring resources (e.g., land, buildings, and machinery) needed to enable an organization's business activities. 6. Financing Cycle: process of obtaining funding, through debt or equity, to run an organizations' activities and to purchase PPE, servicing the financing, and ultimate repayment of financial obligations. 7. GL and Reporting Systems: process of recording, classifying, and categorizing an organization's economic transactions and producing summary financial reports.
Cyberattacks
1. Scanning attack: is sending network packets or requests to another system to gain information to be used in a subsequent attack 2. Skimming Attack: is the unauthorized use of a reader to read tags without the authorization or knowledge of the tag's owner or the individual in possession of the tag. An example of a skimming attack is on radio frequency identification (RFID) tags. 3. Smurf attack: occurs when a hacker sends a request for information to the special broadcast address of a network attached to the Internet. The request sparks a flood of responses from all the nodes on this first network. The responses are then sent to a second network that becomes a victim. If the first network has a larger capacity for sending out responses than the second network is capable of receiving, the second network experiences a denial-of-service (DoS) problem as its resources become saturated or strained. 2. Malvertizing attack: A malvertizing attack is the use of malicious advertisements (ads) on legitimate websites. These ads contain a programing code that will infect a user's computer without any action required from the user (i.e., the user does not have to click on the ad to become infected). Adware, which is a form of malware, can conduct malvertizing attacks. Adware tracks a user's activity and passes it to third parties without the user's knowledge or consent. Click fraud is possible with adware because it involves deceptions and scams that inflate advertising bills with improper usage and charges per click in online web advertisements
Safeguards Against Cyberattacks
1. Vulnerability Testing and Penetration Testing 2. Biometrics 3. Firewalls 4. Access Controls 1. Vulnerability testing is used to identify existing vulnerabilities. In contrast, penetration testing is undertaken to actively exploit potential weaknesses in a system. Penetration tests assess the POSSIBILITY of breaches and their potential SEVERITY. The penetration tests help determine if a vulnerability is genuine and what could be the potential resulting damages. 2. Biometrics - are the use of physical features and measurements for identity verification. Biometrics can include using fingerprints, facial recognition, and even stride pattern to verify individuals. Biometric authentication can safeguard data by preventing its unauthorized access. 3. Firewalls - are used in computer networks to prevent unauthorized users from gaining access to a network. Firewalls monitor the incoming and outgoing traffic on a network and place a barrier around network systems and databases. Firewalls are security rules that establish if a source is trusted and therefore can gain access to the network or is not trusted and therefore is denied access. Firewalls can take the form of hardware, software, or a combination of both. 4. Access controls place limits on who can access a place or a resource. Access controls can be physical or logical. - Physical access controls restrict who can enter into geographic areas, buildings, and/or rooms. - Logical access controls restrict which individuals can connect to computer networks, system files, and data Examples of Access Controls: passwords, personal identification numbers (PINs), credentials, or other authorization and authentication forms.
Cloud Computing Deployment Models
> Community Cloud Model - is provisioned for exclusive use by a specific community of consumers from organizations that have shared concerns (e.g., mission, security requirements, policy, and compliance considerations). It may be owned, managed, and operated by one or more organizations in the community, a third party, or some combination of them. It may exist ON/OFF premises of a cloud provider. > Private Cloud Model - It is provisioned for exclusive use by a single organization comprising multiple consumers (e.g., multiple business units). It may be owned, managed, and operated by the organization, a third party, or some combination of them. It may exist ON/OFF premises of a cloud provider. > Hybrid Cloud Model - The hybrid cloud model is a composition of two or more distinct cloud models (i.e., private, community, or public) that remain unique entities but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load balancing between different types of clouds).. It may exist ON/OFF premises of a cloud provider > Public Cloud Model - The public cloud model can be provisioned for open use by the general public. It may be owned, managed, and operated by a business, academic, or governmental organization, or some combination of them.. The nature of cloud computing infrastructure is that there is a sense of location independence in that the user-consumer generally has no detailed knowledge of or control over the exact location of the cloud vendor provided resources. This means that the resources (e.g., hardware, software, and staff) can come from anywhere and at any time. It exists ON the premises of a cloud provider
Data Governance Frameworks
A. COSO's Internal Control-Integrated Framework The COSO internal control framework helps companies visualize various dimensions of internal control[why, what, where]. Internal controls must be implemented to ensure proper data governance. According to the framework, there are five components of internal controls: > Control environment - internal control over data governance depends on good leadership and culture. > Risk assessment - companies need to identify risks to data governance. > Control activities - these are the specific policies and procedures put in place to ensure data governance. > Information and communication - ensuring proper internal controls over data governance improves information quality throughout the company. > Monitoring activities - companies need to monitor and adapt controls to respond to changes in the environment. B. ISACA's COBIT COBIT is to guide information technology (IT) management and governance. COBIT provides not only a framework, but also a variety of resources, technical guides, and trainings. COBIT focuses specifically on: 1. Security, 2. Risk management, and 3. Information governance
Structured v.s. Unstructured Data
A. Structured Data: - a pre-defined data model, which can be displayed in rows, columns, and relational databases. - it is estimated 20% of the enterprises data - easier to manage and protect with legacy solutions Resides in = Data warehouses, and relational database management system/RDBMS[SQL, Oracle DBv11, IBM DB2, Sybase, MS SQL, Maria DB] Examples = dates, phone number, social security numbers, credit card numbers, customer names, addresses, transaction information B. Unstructured Database - It has no pre-defined data model, hence it can not be displayed in rows and columns and relational databases - It is estimated that around 80% of the enterprise data is unstructured - It more difficult to manage and protect with legacy systems Resides in = Applications, Data lakes, and Non-relational databases[MongoDB, Apache HBase, IBM Domino, Oracle NoSQL, Couch DB, Riak, InfoGrid, InfiniteGraph] Examples = Text files, audio, video, emails, image, surveillance imagery, Tweets.
Smart contracts
A.k.a. As cryptocontract, is a computer protocol intended to digitally enforce the negotiation/performance of a contract. It is an agreement between two people in the form of computer code. Characteristics: - Self-verifying - Self-enforcing - Tamper-proof Smart contracts can self-verify the conditions that are placed inside a contract by interpreting data. Smart contracts are self-enforcing computer programs that can create legally binding rights and obligations to their parties. As digital programs, based on the blockchain consensus architecture, they will self-execute when the terms of the agreement are met, and due to their decentralized structure are also self-enforcing.
Analytical Models
Analytic models are an empirical or data-driven approach to understanding connections[relationship of variable], effects[between variable and outcome], and outcomes[the total effect] Analytic models can provide insight by validating with statistical techniques that effects do exist and by providing estimates of effect sizes. Analytic models can also help with the discovery of potentially unexpected relationships.
Artificial Intelligence
Artificial intelligence involves computers performing tasks "requiring" CRITICAL ANALYSIS and PATTERN RECOGNITION. For example, artificial intelligence can be used to recognize speech or textual patterns. It can be used to analyze various inputs and provide a RECOMMENDED decision. Artificial intelligence (AI) also involves a learning aspect. That is, the computer can learn from prior information processing experiences and revise and update future processing, much like a human would do. AI can augment human analysis and decision making by processing information more quickly and in larger quantities than the human mind can. AI can thereby help discover patterns and trends. While a skilled accountant is needed to help define the parameters and metrics of interest, AI can be trained through repetitive data processing to perform many accounting functions.
Attributes of a Data
Attributes of Data: 1. Availability =refers to the ability to make data ACCESSIBLE when it is needed and where it is needed. Data that is available can be easily accessed in a timely manner. 2. Usability = refers to data being delivered to end-users in "formats and structures" that allow the completion of desired business activities, analysis, and operations. Usable data can be successfully integrated and processed in software and applications desired by end-users. 3. Integrity: refers to the ACCURARY and CONSISTENCY of data. Data must be reliable in order for proper inferences and decisions. Data governance policies should provide specific safeguards and protections to ensure that data is accurate and valid. 4. Security: governs how data is protected from unauthorized access and from possible data corruption.[such as data encryption and physical barriers]. Secure data is protected against "accidental" or "intentional" modifications, removals, or disclosures
Business Intelligence
Business intelligence (BI) leverages software and services to transform data into actionable insights that inform an organization's strategic and tactical business decisions. BI tools access and analyze data sets and present analytical findings in reports, summaries, dashboards, graphs, charts and maps to provide users with detailed intelligence about the state of the business.. Business intelligence (BI) combines business analytics, data mining, data visualization, data tools and infrastructure, and best practices to help organizations to make more data-driven decisions. Business intelligence is descriptive, telling you what's happening now and what happened in the past to get us to that state. Business analytics, on the other hand, is an umbrella term for data analysis techniques that are predictive — that is, they can tell you what's going to happen in the future — and prescriptive — that is, they can tell you what you should be doing to create better outcomes
Business Process Analysis
Business process analysis is used to evaluate and improve core business processes It takes a fresh view of the process and asks how it could be done differently with greater speed or with greater effectiveness. The analysis involves gathering information about the current process and understanding what its objectives are. New ideas and alternatives are then identified and evaluated on whether the process can be altered while still achieving the same objectives. Examples: 1. Hands-on observation of how a process is performed. 2. In-depth interviews with employees adept with the process. 3. Identifying value-added components of the process and evaluating whether non-value-adding activities can be changed or removed. 4. Conducting time-and-motion studies to assess the efficiency of a process. 5. Mapping, diagraming, and flowcharting processes
Data Structure
Data can be structured/organized in several ways: 1. Structured Data: has fixed fields, as as data organized in a spreadsheet with column/row identifiers. It is easily searchable because it is organized and identifiable. Examples: Data in data warehouses, databases 2. Semi-structured Data does NOT have neat, organized fixed fields like structured data, but may still contain organizing features as tags/markers. Though it does not have the "formal structure" of a relational database, it has features which allow it to have classifications and grouping. Emails and web pages on the internet for example. 3. Unstructured: is unorganized and is NOT easily searchable. It is often text-based, like human speech, rendering it difficult to categorize and organized into predefined, set data fields. We have Twitter feeds, text messages, photos, or videos, and data in disconnected computer systems as examples of unstructured data.
Data governance
Data governance is a set of DEFINED procedures, policies, rules, and processes[PPRP] that oversee the following attributes of an organization's data: Availability, Usability, Integrity and Security It the people, processes, and information technology required to create a consistent and proper handling of an organization's data across the business enterprise. NOTE: Data management is a SUPERset of data governance and entails all aspects of managing an organization's data. Data steward: the one responsible to ensure that data governance are followed. Data owner: hold the "ultimate" responsibility of data.
Data visualization
Data visualization is the creation, analysis, and evaluation of data presented in visual forms. These forms can include but are not limited to the following: charts, graphs, diagrams, pictures, dashboards, infographics, tables, or maps It helps identify patterns, trends, and outliers. It can provide understanding about correlations and relationships among critical success factors for an organization's activities or for industry and market behaviors.
DBMS v.s. Datawarehouses
Data warehouses: - is a subset of a DBMS - data redundancies are prevalent DBMS - may house multiple databases from multiple sources to make up a data warehouses - DBMS can be configures to be a data warehouse - data redundancies are controlled - existence of security controls to prevent unauthorized access.
Software as a Service (SaaS)
Delivers applications over the cloud using a pay-per-use revenue model. Software as a service (SaaS) allows users to connect to and use cloud-based apps over the Internet. Rather than building up local computing capabilities and installing software on each local PC, a cloud-based system can make these features available system-wide to each local PC without repetitive installments and equipment. This can reduce IT costs for local servers and computers
Information Systems for Financial and Nonfinancial Data
Historically, financial information was the primary, and often only, information recorded in an AIS. This limitation was primarily a result of technological limitations An example of a nonfinancial system that runs in parallel with an AIS that processes financial data is a Customer Relationship Management (CRM) system. This system captures information about sales calls, shipment tracking, and customer profiles. Tracking nonfinancial information, such as inventory tracking systems, customer sales logs, or human resource information. Integrated approach wherein financial and nonfinancial information can be linked within a single information system, which is an Enterprise Resource Planning (ERP) system. An ERP can link the CRM system to AIS to reduce potential errors and increase information usefulness When maintaining two sets of books will result: - data synchronization problems, - inconsistent results, and - conflicting decisions Examples of major challenges in integrating financial systems and nonfinancial systems include: - cultural barrier, - organizational turf protection (organizational politics), and - lack of trust between the managers who are managing these two diverse systems. NOTE: Technology is NOT the challenge or problem here as it can simply integrate these two diverse systems
Enterprise/Business/Corporate Performance Management (EPM)
It is a PROCESS that facilitates the LINKING of an organization's STRATEGIES to specific PLANS and ACTIONS. The overall process can be broken down into several key sub-processes including the following: - Planning, budgeting, and forecasting - Performance reporting - Profitability and cost analysis Goal: To help strategic goals and objectives be COMMUNICATED to "employees" and REFLECTED in "budgets and action plans". !: EPM is not only a performance tracking system, but also a communication tool. !: EPM either replaces, or augments, legacy spreadsheet [LS used category-specific spreadsheet] - requires reviews and updates on a periodic basis. - relies on key indicators of performance to inform management decisions making and analysis - can help identify business and market trends Benefits: = Reduct human intervention[MAJOR benefit] with the reports are reported automatically - Improvement efficiencies in planning, budgeting, and reporting processes by relying centralized database and workflow[collaboration] - accelerates cycle times and creates more time for value-added strategic work and analysis[continuous data collection and reporting] - aligns finance and operations around a single plan[faster reporting of financial results] - Automatic distribution of information to stakeholders ULTIMATE GOAL of EPM = FORESIGHT Foresight means future showing compliance awareness, strategic planning, and operations planning.
Data Mining
It is about knowledge discovery. It is accessing and examining large data sets through statistical analysis to generate new information. The primary goal is to provide useful information for decision making and anticipating future outcomes. SQL, that is, Structured Query Language, is the standard programming language used to communicate with a database. SQL can be used to perform various functions with databases such as MANIPULATING data, DEFINING data, and ACCESSING data.
Cyberattack
It is an attempt by an individual or organization to gain access to the information system or computer of another individual or organization. It is a malicious or deliberate action taken in order to inflict harm on another by altering, disabling, destroying, or stealing electronic information Examples: 1. Malware Breach. Malware can be a: Ransomware - blocking access to computer/server functions Spyware - to gather/transmit data Virus - to disrupt system processing 2. Phishing - the false presentation or fraudulent communication of information from a reputable source with the goal of "installing malware" or "stealing valuable personal information" 3. Denial-of-service attack - it overloads a network or computer with requests to process information. These requests exhaust the computational processing power or bandwidth of the computer or network and thereby makes them unable to process legitimate requests.
Enterprise Resource Planning
It is the INTEGRATED "management" of core business PROCESSES. ERP brings together business functions such as: - inventory management, - accounting, - finance, - human resources, - supply chain management, and more. This shared database allows employees in one business function to view and access data generated in other business functions. Benefits: - Greater synchronization of information across business functions - Information is available real-time rather than waiting for data to be shared across business functions - Collaboration and teamwork are encouraged because information is widely distributed. - Employees only need to learn how to use a single system rather than multiple - Lower operational costs by eliminating redundant systems and simplifying system maintenance An ERP system should be MODULAR in order to accommodate various business models where each module will work independently and in real-time or near real-time. Integration across these modules should also allow for a seamless flow of data across the various modules.
Database management system
It is the INTERFACE or program between a company's database and the application programs that access the database. The DBMS defines, reads, manipulates, updates, and deletes data in a database. It controls access to the data and maps each user's view of the data (i.e., the DBMS can be programmed to present data in a way that makes sense for each user). It optimizes how data in databases are stored and retrieved. It facilitates an organization's administrative operations related to disaster recovery, regulatory and legal compliance, and performance monitoring. NOTE: DBMS does not function to detect malware
Normalization
It refers to the decomposing database table design into its simplest form. It simplifies database design to remove redundancy and increase integrity.
Regression v.s. Time Series; Panel Data
It should be noted that regression analysis is often done on cross-sectional data, that is, data that takes place within the same time period. It is possible to combine cross-sectional data with time-series data to observe the same phenomena over multiple periods of time. Combined cross-sectional and time-series data is referred to as PANEL DATA.
RPA v.s. AI
RPA - prescribed, rule-based system. RPA can do a great job of handling repetitive, rules-based tasks that would previously have required human effort, but it doesn't learn as it goes like, say, a deep neural network. RPA robots perform the same way every time. They don't learn from one repetition to another, and they will not improvise or come up with a better way of doing their programmed task. Benefits: - Accuracy - Compliance[create audit trails and follow regulatory rules precisely] - Speed - Reliability - Improved employee morale AI - is adaptive Artificial intelligence (AI) is the simulation of human intelligence processes by computer systems, or "machines."Popular applications of AI include image recognition, machine vision, speech recognition, chatbots, natural language generation, and sentiment analysis. The processes include LEARNING, REASONING and self-correction. While RPA is used to work in conjunction with people by automating repetitive processes (attended automation), AI is viewed as a form of technology to replace human labor and automate end-to-end (unattended automation). RPA uses structured inputs and logic, while AI uses unstructured inputs and develops its own logic
Sequence of steps needed when a data analyst is applying "data mining" tools and methods
Raw data >> "Normalized" Data >> Data Extraction >> Data Insights !: Target raw datasets must be cleaned and normalized to remove missing, erroneous, or inappropriate data. Then, the normalized data is extracted or analyzed using several methods such as data analytics, statistical analysis, data mining methods, simulation technique, or forecasting methods to yield new insights.
Application Software testing and its levels in SDLC
Software testing is the process of evaluating a system with the intent of finding bugs. It is performed to check if the system satisfies its specified requirements. Levels of Testing: 1. Module/Unit Testing (First level) - A module or component is tested in isolation. 2. Integration Testing (second level) - testing a group of related models. It aims at finding interfacing issues b/w the modules i.e. if the individual units can be integrated into a sub-system correctly. [types: top-down, bottom-up, sandwich, or bigbang] !!!: this the is LEAST understood by developers and end users since the numerous ways to conduct it and no base documents to rely upon. > Formal change control mechanism start in this testing. Since after cutoff point, it will be too late to change and may require reasons and documentation. —- the CUTOFF POINT for the development project—— - after this, the following testing are called back-end testing: 3. System Testing (third level)- It is the level of testing where the complete integrated application is tested as a whole.It aims at determining if the application conforms to its business requirements. 4. Acceptance testing (final level) - It aims at ensuring that the product meets the specified business requirements within the defined standard of quality. [two types: Alpha & Beta Testing] NOTE Unit & Integration Testing = Performed by Programmers System Testing = Jointly by Users and Programmers Acceptance Testing = End Users[accounting staff] and IT production staff
Visualization Tools and its uses
Spaghetti plot: is a workflow system to visualize data flows through a system where flows appear as noodles. The results of a spaghetti plot can be useful in streamlining or simplifying workflow to save resources, such as time, money, materials, and energy. It is used to: a. Track product routing and material movement through a factory, b. Reduce inefficiencies in an officer/factory/warehouse workflow system; and c. Show the effects of medical drugs on test patients during a new drug trial among others Bullet Chart: are good to compare two variables such as sales dollars and salesperson. Actual sales data and target (quota) sales data can be compared for each salesperson to visualize which salesperson meets the assigned target sales quota Layer chart: The layer chart is linear in appearance but has a different representation. It depicts the accumulation of individual facts stacked one over the other to create the overall total. Histogram: A histogram is an alternative to a box plot showing data distributions with a different perspective.The histogram can reveal peak time and non-peak times.
Prescriptive v.s. Predictive
Specific "recommendations" are an output of prescriptive analytics versus predictive analytics. Prescriptive analytics provide the user with actions to take in order to experience a desired outcome (i.e., based on known traffic patterns at this time, take this route to work/home)..
Schema
a concept or framework that organizes and interprets information It is a set of specification that defines a database. Specifically, it includes entity names, datasets, data groups, data items, areas, sort sequences, ACCESS KEYS and SECURITY LOCKS.
Systems Development Life Cycle (SDLC)
The overall process for developing information systems from planning and analysis through implementation and maintenance 1. System Analysis 2. Conceptual Design 3. Physical Design 4. Implementation and Conversion 5. Operations and Maintenance - System Analysis: evaluative process to assess user needs[the end goals], resource requirements, benefits, and costs. Gathering pertinent information may require surveys of employees, in-depth interviews with system users, documentation of best practices, and data analysis. The Analysis phase is synonymous with the assessment of business requirements to VALIDATE that stakeholder NEEDS are being met throughout the project - Conceptual design: process of creating plans for meeting the needs of the organization. Detailed specifications are created to provide instruction on how to achieve the desired system - Physical Design - process of identifying the features, specifications, and equipment needed in order to make the system OPERATIONAL. - Implementation and Conversion: process of making the system design a reality. Hardware and software are installed and tested to ensure proper functionality - Operations and Maintenance: process of fine-tuning and refining the system as users explore functionalities and place demands on the system processing. New capabilities may be required as users identify needs. Hardware may also need replacement over time due to wear and tear and/or obsolescence. Software may also be updated to improve processing or to add functional options.
Ransomware attack
This refers to data-hijacking. Scammers and data-kidnappers send emails to innocent users that look like courtesy messages from legitimate companies to spread the malicious software (malware) called "ransomware botnet". Ransomware works by encrypting all the files on a user's computer (e.g., photos, documents, and tax refunds) that the user has saved to the hard drive or to any shared folders. Once the files are encrypted, the user will not be able to open the files without the decryption key, which the user can get only from the criminals behind the ransomware. Hackers or criminals hijack the data files and hold the user files "hostage," often encrypting them and demanding payment, typically in bitcoins, for the user to get the files back.
Vulnerability Testing v.s. Penetration Testing
Vulnerability scans LOOK/CHECK for known vulnerabilities in your systems and report potential exposures. It does not exploit the vulnerabilities Scope: Wide Penetration tests are intended to EXPLOIT weaknesses in the architecture of your IT network and determine the degree to which a malicious attacker can gain unauthorized access to your assets.[possibility and the severity] Scope: Detailed A vulnerability scan is typically AUTOMATED, while a penetration test is a manual test performed by a security professional. Penetration testing requires high skilled knowledge and that's why it is costly and it takes time. Vulnerability scanners merely identify potential vulnerabilities; they do not exploit the vulnerabilities Here's a good analogy: A vulnerability scan is like walking up to a door, checking to see if it is unlocked, and stopping there. A penetration test goes a bit further; it not only checks to see if the door is unlocked, but it also opens the door and walks right in. VS: B: - Quick and high-level look - Very affordable - Automatic - Quick to complete L: - False positives - Does not confirm that a vulnerability is exploitable PT: B: - More accurate and thorough results - Rules out false positive L: - Time - Cost
Regression Analysis
Y = a + bX Y = dependent variable A = fixed costs / intercept B = slope X = independent variable Coefficient of Determination[r2] = measures how well a regression line fits the observed data. It can be interpreted as the percent of variation of DEPENDENT that is explained by the variation in the INdependent variables. That is, if r2 = 0.73, it means that 73% of the variation in the dependent variable is explained by the variation in the independent variables. Coefficient of Correlation [r] = measures how to variables are related. Standard error of the estimate = measures the accuracy of predictions. Goodness of fit =measures how well the "observed data" FIT a "statistical model." That is, it is a summary of the DISCREPANCY between observed values and what the model would predict the value to be. Confidence internal = is a RANGE of VALUES in which the true value lies. Chi-square [x2] = the smaller the better. If X² is farther from zero, then the difference between the actual number and the expected number is larger. LIMITATION of REGRESSION: It should ONLY be used to make predictions with the RELEVANT RANGE. relevant range refers to a set of observed values. > If a desired prediction is outside the set of observed values, there is a possibility that the estimated relationship within the relevant range will not be the same outside the observed values. This is called EXTRAPOLATION - predictions about an unknown observational range are inferred from a known observation range.
Zero-day Attack
Zero-day attacks, exploits, and incidents are the same for which there is no known software patch. zero-day attacks (or zero-day exploits/incidents) as attacks that target publicly known but still unpatched vulnerabilities.