Chapter 17 - Data Analytics
What is Unsupervised Learning?
Description: a) Uses algorithms to analyze unlabeled data sets for hidden patterns b) It does not require human intervention Example: - Clustering - Association - Dimensionality reduction
What is Robotic Process Automation (RPA)?
Involves software that can be programmed and managed easily, using a drop and drop interface that does NOT require coding
A larger manufacture of food packaging products developed an algorithm to predict wear and tear on manufacturing machine parts by collecting historical data like temperature, pressure, and vibration to predict breakdowns and alert maintenance to replace parts and prevent unplanned downtime
IoT
What is Reinforcement Learning?
Description: a) Focuses on decision making by rewarding desired behaviors and or punishing undesired behaviors b) Decisions are made sequentially by the AL and each decision is rewarded or punished to train the AL Example: - Task learning - Skills Acquisition
RPA uses conditional statements, in the form of "If this then that" to pre-program "rules" or decisions of that software (T/F)
True
What is Network Analysis?
an analytics technique that visualizes relationships among participants in a data set to learn about the social structure those relationships create.
In the music industry, this technology is used for smart contracts. By entering into a decentralized, transparent contract, musicians agree to royalties and are paid in full and on time without the involvement of intermediaries
blockchain
What are some reactions to virtually
business are
Companies are hiring well-rounded accounting students with __________________ who also understand technology use cases and data fundamentals
core accounting knowledge
An accounting firm uses this technology to count cattle in fields during the audits of its farming clients
drones
What is an Emerger?
- New technology - not yet regularly used by companies
What are the risks for an early adopter?
1) Business disruptions - due to integration issues with existing systems 2) Financial Loss - from investments if the technology fails 3) Security vulnerabilities - due to unknown variables in the technology 4) Regulatory risk - Due to new or unrefined regulatory requirements
What 2 key factors for Data Compensation?
1) Categorical values 2) Quantitative values
What examples of Data Analysts?
1) Data Engineer 2) Data Scientist 3) Statisticians 4) Data Analyst 5) Business Analyst
What are the 5 characteristics for good data?
1) Data integrity 2) Data Retention 3) Data Validity 4) Data Transparency
What are the 3 strengths of Block Chain?
1) Decentralization - no need for intermediary or a central authority 2) Transparency - servers communicate with all network participants 3) Immutability - assurance that information and assets are secure
What is Prescriptive analytics?
1) Description: - Identifies What we should do - Requires advanced programming - uses a decision-making protocol and historical data to train a program on what to do in a real-time situation 2) Key characteristics - Requires advanced technology and algorithms - Uses the 3 previous analytics types for insights - Based on live, historical, and external data 3) Questions answered - How should we act? - Should we make this decision
What is Diagnostic analytics?
1) Description: - Provides insights into why something happened - often looks at data in a variety of ways to identify trends or investigate causes 2) Key characteristics - Used to troubleshoot and investigate issues - based on historical data 3) Questions answered - Why did it happen? - Where did this come from?
What is Descriptive analytics?
1) Description: - Tells us what has happened - Looks at historical data and condenses it into smaller, more meaningful bits of information 2) Key characteristics - Easy to access - Easy to visualize 3) Questions answered - What is happening? - When did it happen?
What is Predictive analytics?
1) Description: - uses statistical modeling and algorithms to predict what is likely to happen - provides powerful tools that assist in decision making and inform future actions 2) Key characteristics: - uses statistical modeling and predictive algorithms - based on historical and live data 3) Questions answered - What is likely to happen? - What is the logical outcome?
What are the 4 most widely used categories of data analysis from the Data Storage and Analysis?
1) Descriptive 2) Diagnostic 3) Predictive 4) Prescriptive
What are the 2 steps to summarize data?
1) Group according to categorical value (product name) 2) Perform mathematical functions on the quantitative value (price)
How to mitigate the risks? (3)
1) Internal audit a) Trusted advisors b) Pre-implemented reviews c) Post-implementation reviews d) Reporting to stakeholders 2) Emerging Technology COE a) Identifying risks b) Selecting risk response c) Managing risk appetite 3) Enterprise Risk Management a) Assessing project feasibility b) Subject matter expertise
What do Descriptive and Diagnostic have in common?
1) Look at the past and use historical data to learn more about what occurred and why it occurred a) Descriptive = what has happened? b) Diagnostic = why did it happen? 2) These 2 types of analysis are LESS complex than predictive and prescriptive, which analyze historical data and also predict future events, providing recommendations on what the business should do.
Sentiment analysis can:
1) Rate the emotions underlying a text communication on a scale from negative to positive 2) Detect specific emotions, such as happiness, anger, sadness, or disappointment 3) Connect a sentiment to a cause by determining what aspect of the text is causing the sentiment 4) Sort big data based on emotional context 5) Discover patterns in big data based on emotional context
To be feasible, a proposed RPA use case must fit into the rigid structure of the rules-based programming RPA uses. What are 5 basic projects for RPA
1) Routine 2) Consistent 3) Digital 4) Time-consuming 5) Performed Frequently
What are the 3 main Machine Learning Approaches?
1) Supervised Learning 2) Unsupervised Learning 3) Reinforcement Learning
What are the 3 important considerations when investigating time series data?
1) Time Trend - consistent movement that does not repeat itself 2) Seasonality - a consistent movement that repeats on a regular basis 3) Noise
Block chain overview:
1) someone requests a transaction 2) requested transaction is broadcast to a P2P network consisting of computers known as = NODES 3) The P2P network of nodes validates the transaction and the user's status using known ALGORITHMS 4) Once verified the transaction is combined with other transactions to create a new BLOCK OF DATA for the Ledger * the new block is then added
What is Simulation?
1) uses complex calculations to predict the outcomes and probabilities associated with a decision that influences a data set: A) It can study the causes and effects of various actions. For example, a sales simulation can use market research data to predict the outcome of sales in a region if a new product line is added. b) Simulations are adjusted to test different actions, and the predictions are reviewed each time the simulation is run to evaluate how the actions affect the model.
What is Blockchain?
Blockchain is an encrypted, distributed database shared across multiple computers or nodes that are part of a community or system.
What is Block Chain?
A system that enables the recording of digital transactions packaged in BLOCKS that form a sequence, like a chain in a peer to peer
What are Geospatial Analytics?
A technique that gathers, transforms, and visualizes geographic data and imagery, including satellite photographs, Global Positioning System (GPS) coordinates, and more. Advantage: - adding one more dimension (location) to traditional data. create maps that show changes over time and where these changes took place. Maps help us visualize the added dimension and makes it east to identify patterns
What is Sentiment Analysis?
A technique used to detect favorable and unfavorable opinions toward specific products and services using large numbers of textual data sources
What is the Gamification concept?
Adding game mechanics into nongame environments, like a website, online community, learning management system or business' intranet to increase participation.
What are Time Series?
Captures data that occurs in chronological order across a period of time. * Methods used to analyze time series data to extract meaningful statistics or characteristics of the data
What is the Business Process Automation?
Companies are implementing robotic process automation and artificial intelligence b) Process of managing information, data, costs, resources, and investments by increasing productivity through automating key business procesess with computing techoloy
What are Labeled Data?
Contains predefined tags or descriptors that an ML algorithm uses to understand the data set or learn from it. - Labeling data is a human act of intervention, which occurs only when using supervised ML analytics
What does Supervised Learning?
Description: a) uses labeled data sets to train the algorithm to classify data or predict outcomes from a data set b) Before it can begin, human intervention is necessary to establish the labels within the data set Example: - Classification - Regression
What are weighing in the business and technology advances considerations
Disruption of activities vs competitive advantages efficiency, opportunities
A property management company uses technology to survey the condition of land and buildings that its client own
Drones
What is virtual reality (VR)?
Full immersion into a virtual world with virtual objects
What is Mixed Reality (MR)?
Hybrid combination of real and virtual objects coexisting in real time
What is Artificial Intelligence?
Involves computer systems that are trained to perform tasks that typically require human intelligence - uses complex algorithms - Learn and solve problems - Without human innervation
in a warehouse, this technology moves inventory past others
robots
What is linear regression?
Methods used to explain the relationship between a dependent variable and single or multiple independent variables (factors hypothesized to affect the dependent variable)
What is a Multiple Regression?
One dependent variable and multiple independent variables
What is a Simple Regression?
One dependent variable and only one independent variable
What is Augmented Reality (AR)?
Overlays reality with digital images
What are Autonomous Things (AuT)?
Physical devices controlled by computers using complex algorithms a) Drones - Using autonomous or sem-autonomous robots to: - e.g monitor crops for weeds b) Robots
What is Process Mining?
Process mining analytics that reveal, in a visual format, what DID happen in a systems-based business process by using EVENT LOG DATA to show what individuals, systems, and machines did 1) to find deviations in an expected process path 2) to identify processes for efficiency improvement
Each year, the regional audit directors perform an annual audit risk assessment. One metric they use is the "prior year's audits", which indicates how many years it is been since a risk category has been audited. To achieve this, they open the prior year's audit plan and manually add the names of the areas that were audited last year
RPA
In a call center, this technology does website scraping and collects customer data, does the required data manipulation, and gives the call center manager a single view with all the information about a customer
RPA
In a factory, this technology works alongside human workers augmenting their performance. Their movement are programmed enabling them to perform specific repetitive manufacturing tasks
Robot
A parking deck is patrolled without the need for security personnel to leave the office is an example of what technology?
Robots
What are Confidence Interval?
The "lower confidence bound" and "upper confidence bound" are part of a confidence interval, which is an estimate derived from observed data that shows a range of possible values.
What are Nodes?
The links between the nodes are the relationships, or interactions, that connect the participants.
What is gamification?
The merger of video game principles and real work simulations where the use can achieve badge
What is Forecasting?
The process of estimating future events based on the combination of past and present time series data. This predictive analytic method is a staple in accounting data analytics
What is the main difference between Supervised and Unsupervised learning?
The use of human intervention to create labeled data sets
What is Crytology?
The using science of using a secret code for secure data communication which involves encryption and decryption
What are Independent Variables?
This is the factor that may be influencing the dependent variable. There can be one or more independent variables, depending on the type of regression performed.
What are Dependent Variables?
This is the value to be understood. it is often called outcome
What is Machine Learning (ML)?
Uses algorithms and statistical models to train an AI system through patterns and trends in data sets. ML programs systems to perform tasks without explicit instruction and is a popular application of AI in data analytics
A large retailer uses head-mounted displays to train its employees to respond to black friday shopping scenarios
VR
What the formula for a linear regression?
Y = A + Bx where: Y = Dependent variable x = Independent variable A = Y-intercept (value of Y when x = 0) B = Slope of line
What is a Linear Regression?
a statistical technique we use to estimate the relationships between a dependent variable and one or more independent variables
What is Time trend in time series data?
a) A consistent movement in the time series data that does not repeat. -e.g is an increase in revenue through the fiscal year due to a new product launch
What is Seasonality in time series data?
a) A consistent movement in the time series data that repeats on a regular basis -e.g an increase in revenue every Nov and Dec due to winter holiday sales
What are Social Network Analysis?
a) A type of network analysis that investigates social structures on social media
What is Nose in time series data?
a) Additional movements in the time series data that cannot be explained as a trend or seasonality. e.g drastic spike in revenue at the end of Feb due to a large customer order of a one-time corporate event.
What is Augmented reality for consumer facing and business opeations
a) All of th edaa is used for marketing, future product design
What is Anomaly Detection?
a) Also known as outlier analysis, reveals observations or events that are outside a data set's normal behavior
What are the 3 Consumer Facing IoT product?
a) Appliances - Manufactures leveraging big data collected from consumer ioT products to: - design new products - Create marketing opportunities b) Vehicles - manufacturers leveraging big data collected from consumer IoT products to: - Design new products - Create marketing opportunities c) IoT Wearables c) Insurance companies using big data collected from wearables to: - Monitor heart rate, calorie burn, sleep quality, and more - Estimate an individuals longevity to calculate a premium for that individuals' lifestyle
What is a Data Engineer?
a) Builds the tech infrastructure and architecture for gathering, growing, and storing raw data b) Data modeling, hardware, networking
What are Categorical Values?
a) Descriptive components within a data set b) e.g the gender identity of an employee is a categorical value in the main employee table of a human resources database c) categorical values are qunlitative in nature
What are the 3 ways to increase in adoption
a) Emerging b) Disruptive c) Widely used
What is Roger's adoption curve
a) Innovators (2.5%) b) Early adopters (13.5%) c) early majority (34%) d) late majority (34%) e) Laggards (16%)
What is Data Summarization?
a) Involves simplifying data to quickly identify the composition of categorical and quantitative values. b) Can compress the data into smaller, easier-to-understand outputs called data summaries with specific columns
What are Exploratory Data Analytics?
a) Techniques reveal the key characteristics of a data set. b) Whether this type of analytics is performed using ML algorithms or another technique, the purpose is the same
what is X-reality?
a) VR b) AR c) MR
How does Classification differ from Clustering?
a) While both are methods of categorization, they differ in the type of ML they use: 1) Clustering uses unsupervised ML to analyze unlabeled data inputs 2) Classification uses supervised ML to analyze labeled data inputs
What is Data scientist?
a) Work with large volumes of data. b) Designs and programs algorithms to collect and analyze data and perform predictive analytics c) Data modeling, programming/coding, statistics
What is Natural Language Processing (NLP)
a) a form of textual analysis that gathers, processes, and interprets meaning from human language. b) NLP uses ML algorithms to learn the meanings of words and improve interpretation based on what the algorithms learn.
What is Classification Analysis?
a) categorize data points into groups based on their similarities noted in a previously defined data label e.g - loan applications (credit risks) - spam detection - donation solicitation
What is Clustering?
a) categorizes data points into groups based on their similarities without human intervention e.g - cluster songs based on length, popularity, danceability, acoustic levels, and energy - cluster subscribers based on average claim costs and specialists visited. suggest health plans to the subscriber
What is Textual Analysis?
a) category of data analytics techniques used to interpret objects that include words. b) Textual analysis can interpret text transcriptions of conversations, emails, process narratives and flowcharts, and more to analyze a business process.
What is Event Log Data?
a) data about activities in a system and includes the timestamp of when the activities occur. b) For example, when a person interacts with a system's user interface to input new data, the system captures a record and timestamp for that update. This record and timestamp are part of the system's event log data.
What are Clusters?
a) groups determined by the distance between individual items, which indicates how closely related the data points are b) Clustering is an unsupervised ML technique in which the data input contains unlabeled data
What are the weaknesses of Block Chain
a) malicious collusive parties may control more than half of the computing power, the network, and the process of recording new block
What are Quantitative values?
a) numeric data points that can be summed, counted, or otherwise analyzed using mathematical operations b) Dates and dollar amounts are 2 examples of quantitative data values
What are Statisticians?
a) provides a methodology for drawing conclusions from data b) Statistics, data
It is a sequence of blocks containing __________ ledgers of transactions These blocks of data are stored on _____ , which are the computers, servers of participants in the network
a) unchangeable * each block in the chain has logical relationship with the preceding block b) nodes
What is a Monte Carlo simulation?
a) which predicts the probability of different outcomes in the presence of many random variables. To use it, we generate models of potential outcomes and conduct a risk analysis for each one. Monte Carlo can use reinforcement learning to learn directly from its experiences to evolve the model. e.g Investment firms: It can predict the value of a portfolio based on various investment options while considering the uncertainty of the financial market and other external factors.
What are miners?
people who create the ledgers of transactions in chained blocks using their putting power (nodes in the network) to solve the math encryption puzzles that secure the transactions
Potential
potential risks associated with new technology
What is Internet of things?
the idea that everything / every device could be given an ip address and put on the internet. Ex. using your phone to turn on your microwave