CPCU 550: Maximizing Value with Data and Technology

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Domain knowledge

Information related to the context of the information a data scientist is working with. Q) As an underwriter, Jake is considered a valuable member of the of the data science team at Durham Insurance. Which one of the following necessary skills does Jake contribute to the data science team? A) Domain knowledge After data is selected to be mined, it must then be prepared, or cleaned, to eliminate missing or inaccurate data points. Data is selected and cleaned in accordance with the specific goal of the data-mining endeavor.

Accelerometers

Measures changes in speed Q) Among a wide variety of sensors used by Mega Manufacturing, Inc., are units that measure the range of motion of a given piece of machinery. Which one of the following specialized sensors would be used for this purpose? A) Accelerometers Q) In her role at the Federal Emergency Management Agency (FEMA), Sabrina monitors a variety of technologies to help provide advance warning of pending catastrophes. Which one of the following types of disaster would she be alerted to by an accelerometer? A) Earthquake

Sensor Categories and Examples for Risk Prediction and Prevention

Mechanical- Pressure sensors, Flow sensors, Motion detectors Biochemical- Home diagnostics tests, Wearable fitness monitors, diabetes test strips/meters/patches Thermal- Smoke detectors, Heat sensors, Computer Hardware sensors Radiant- Optical sensors, Radar, Radio frequency identification, (RFID) tags

Linda has a driving record free of accidents and traffic violations, and telematics data indicates that she is a safe and careful driver who represents a minimal level of loss exposure. Her insurer is most likely to place her into which one of the following rating categories?

Preferred

Through data-driven decision making, data science helps insurers and risk managers improve their business results by: To achieve their operational goals and effectively use data science to enhance their performance and profitability, insurers rely on internal and external data, which can be structured or unstructured, from a variety of sources.

- Automating decision making for improved accuracy and efficiency—Providing online quotes for personal auto insurance based on a computer algorithm has become commonplace. - Organizing large volumes of new data—By organizing data according to various characteristics, an insurer could, for example, use telematics data to examine speed, braking patterns, left turns, and distance traveled. - Discovering new relationships in data—For example, a risk manager could identify the characteristics of workers who never had a workplace accident and determine whether correlations could be gleaned to improve safety for all workers. ANOTHER EXAMPLE Risk manager Mustafa uses data science to improve his business results. Data science helps him to discover new relationships in data, automate decision making, organize large volumes of data, and explore new sources of data. Which one of the following factors of data science allows Mustafa to identify correlations in health and lifestyle that indicate a life insurance applicant is more prone to certain diseases - Exploring new sources of data—An insurer could use text mining to analyze claims adjusters' notes for various purposes, such as developing an automated system to predict high-severity claims, and then assign resources as appropriate.

Data Science Concepts

1. A question or problem is raised 2. Research is conducted 3. A hypothesis is conducted 4. Experiments are performed 5. Data from the experiments is analyzed 6. a conclusion is reached 7. The conclusion is communicated

GreatAmerica Insurance Co. uses systems and software that apply both augmented intelligence and autonomous intelligence to its operations. Because both of these can learn to improve the accuracy of calculations over time, they are examples of

Adaptive systems

The sum of probabilities in a probability distribution must be

1.00 Q) Which one of the following describes how probabilities are expressed? A) Numerically, as a fraction, a percentage, or a decimal

Arty has created a predictive model for bodily injury claimants that are likely to magnify their symptoms. He has received a new claim and this instance has four nearest neighbors, three of whom have claimants who magnified their symptoms and one that was legitimate. Arty refers to this data point as

4-NN

What makes an insurer predictive model fair and transparent?

A fair model allows an insurer to make risk and pricing decisions based on data that's both accurate and truly predictive of the expected future cost of coverage. A transparent model allows an insurer to demonstrate the rationale for risk-related decisions and illustrates how consumer data is collected and used. It counters the perception of insurers' algorithms making decisions based on results from so-called black boxes that generate mysterious calculations from consumer data. These make a predictive model transparent (transparent model) Illustrate how consumer data is collected Demonstrate the rationale for risk-related decisions Illustrate how consumer data is used ***NOT TRANSPARENT**** Justify the level of profit required

Law of large numbers

A mathematical principle stating that as the number of similar but independent exposure units increases, the relative accuracy of predictions about future outcomes (losses) also increases.

Eastern Appalachian Mutual is a smaller, regional insurer that competes with larger insurers such as Greater American Insurance Company. Which one of the following might help it compete more efficiently with larger organizations?

A mobile app

Actuary Q) Which of the following is an example of an insured's PII that may be protected by the Health Insurance Portability and Accountability Act of 1996 (HIPAA)? A) An insured's address is an example of PII.

A person who uses mathematical methods to analyze insurance data for various purposes, such as to develop insurance rates or set claim reserves. Actuaries review the data Q) Grace is an actuary. What metrics can she use to evaluate a model's performance? A) Accuracy Precision - Before using a new predictive model for machine injury accidents, Shelton Manufacturing wanted to evaluate its performance on data that was not used to train the model. One of the metrics that it used was to divide the number of true positive results by the total number of positive results. Recall F-score Through data mining, the modeling process is continually evaluated and refined. When performed correctly and conscientiously, data mining serves as a legitimate, ethical means of drawing conclusions from data.

Predictive modeling

A process in which historical data based on behaviors and events is blended with multiple variables and used to construct models of anticipated future outcomes.

Performance Metrics

Accuracy—A measure of how often the model predicts the correct outcome. The formula for accuracy is ( TP + TN) ÷ ( TP + TN + FP + FN), where TP equals true positive, TN equals true negative, FP equals false positive, and FN equals false negative. Precision—Instead of looking at the total results of a model, precision measures only the POSITIVE results. It's usually a better measure of a model's success than accuracy. The formula for precision is TP ÷ ( TP + FP). Recall—This is a measure of how well the model catches positive results. The formula for recall is TP ÷ ( TP + FN). F-score—This is a popular way of evaluating a predictive model because it considers both precision and recall. The formula for F-score is 2 × ([Precision × Recall] ÷ [Precision + Recall]). These metrics can best be understood when used with a confusion matrix. This matrix shows the level of confusion within a model as it makes predictions. It does this by revealing the amount and types of errors made using the model.

What steps should he take toward processing a claim?

Acknowledge the claim and assign it to a claims representative Identify the policy Contact the insured or the insured's representative Investigate and document the claim Determine the cause of loss, liability, and loss amount • Conclude the claim

Energy transfer theory

An approach to accident causation that views accidents as energy that is released and that affects objects, including living things, in amounts or at rates that the objects cannot tolerate. Q) Modifying the contact surface or basic structure that can be affected by installing breakaway highway light poles or requiring front and side airbags in automobiles to cushion occupants' impact is a basic strategy of the A) Energy transfer theory The energy transfer theory posits that the basic cause of accidents is energy out of control. Its approach to preventing accidents or reducing the resulting damage focuses on controlling released energy and/or reducing the harm caused by that energy. Basic strategies of the energy transfer theory deal with maintaining safe distances between objects that may move at great speed, such as separating pedestrians and motor vehicles, and ensuring that objects that have the potential to move at great speeds also have the ability to slow themselves down or stop themselves completely, such as equipping elevators with emergency brakes or installing physical barriers within buildings to prevent the spread of fires or floods.

Types of Insurer Operational Data Q) What's the difference between data mining and data dredging? A) Data mining involves analyzing large amounts of data to find relationships and patterns that will help in developing business solutions. Data dredging (also referred to as data fishing), on the other hand, is a deliberate search for any relationships between data—even those that are insignificant. Ensuring fairness also requires models to be based on data mining—as opposed to data dredging, which often results in drawing unsupported or premature, and therefore perhaps unfairly discriminatory, conclusions from data. Data dredging can also be used to confirm bias.

An insurer's operational data includes policy and premium data, accounting data, claims data and notes, billing data, and producer data. Each can be used in predictive modeling. Q) Sofia uses insurer operational data such as policy and premium data, accounting data, claims data and notes, and billing data in her analyses. Which one of the following types of data does Sofia refer to when she needs to analyze loss adjustment expenses, allocated loss adjustment expenses, and unallocated loss adjustment expenses? A) Accounting data Policy and Premium Data- An insurer's profits depend heavily on premium revenue. Insurers use rates based on the insured's loss exposures to determine the premium to charge for insurance policies. Many types of ratemaking analysis require information about policy exposures and premiums linked with their corresponding claims and loss information. This data is often broken into two separate databases: a policy database and a claims database. Accounting Data- Examples of accounting data include underwriting expenses, loss adjustment expenses (LAE), allocated loss adjustment expenses (ALAE), and unallocated loss adjustment expenses (ULAE). Claims Data and Notes- In addition to housing a policy database, most companies maintain a database to capture information about the claims within each book of business. In a claims database, each record generally represents a transaction tied to a specific claim (such as a loss payment or change in reserves) or provides a snapshot of the claim's status, showing cumulative payments and current reserve. Similar to the policy database, claims that involve multiple coverages or causes of loss may be represented by separate records or through indicator fields.

Data science There are several fundamental concepts of data science. How many can you name? These are four fundamental concepts of data science: Systematic processes can be used to discover useful knowledge from data. Information technology can be applied to big data to reveal the characteristics of groups of people or events of interest. Analyzing data too closely can result in interesting findings that are not generally useful. Data mining approaches and results must be thoughtfully considered in the context in which the results will be applied.

An interdisciplinary field involving the design and use of techniques to process very large amounts of data from a variety of sources and to provide knowledge based on the data. Through data-driven decision making, data science helps insurers and risk managers improve their business results by: 1) Automating decision making for improved accuracy and efficiency—Providing online quotes for personal auto insurance based on a computer algorithm has become commonplace. 2) Organizing large volumes of new data—By organizing data according to various characteristics, an insurer could, for example, use telematics data to examine speed, braking patterns, left turns, and distance traveled. 3) Discovering new relationships in data—For example, a risk manager could identify the characteristics of workers who never had a workplace accident and determine whether correlations could be gleaned to improve safety for all workers. 4) Exploring new sources of data—An insurer could use text mining to analyze claims adjusters' notes for various purposes, such as developing an automated system to predict high-severity claims, and then assign resources as appropriate.

Algorithm/ Algorithms

An operational sequence used to solve mathematical problems and to create computer programs. Modeling software—both predictive and descriptive—uses algorithms to mine and analyze data. Algorithms can take a number of forms, such as mathematical equations, classification trees, and clustering techniques. After an insurer selects the business objectives of its model and the data to be analyzed, it chooses an algorithm. Strictly speaking, an algorithm is different from a model, which is an attempt to represent the state of something. However, in the world of data analytics, the terms are often used interchangeably.

Data-driven decision making To gain a competitive advantage from data analytics, insurers need people with the skills to manage rapidly evolving types and sources of data in ever-increasing amounts. They also need individuals who understand the interaction among the various activities in the insurance value chain and how data and the decisions made with it can affect those activities. Insurers have typically employed individuals with the education and skills to evaluate data and determine specific results, such as prices for products and claim reserves. However, without additional training in data science, these people may not be able to effectively apply data analytics to big data.

An organizational process to gather and analyze relevant and verifiable data and then evaluate the results to guide business strategies. Q) What's the difference between descriptive and predictive analytics? A) Descriptive analytics looks at data to determine what's already occurred. Predictive analytics, as its name suggests, looks at data to determine what's likely to happen at some point in the future. Following the process outlined in the decision-making model will help ensure the best results. An important first step is to define the problem within a business context. Without establishing the reason it matters to the business, it's unlikely that modeling and analyzing data will be effective. Insurers use various techniques to gather, categorize, and analyze unstructured data, such as by analyzing free-form claims notes to show patterns that may not be readily captured in a more standard data analysis.

Probabilities

Any probability can be expressed as a fraction, percentage, or decimal. For example, the probability that a coin will land with its heads side facing up can be expressed as 1/2, 50 percent, or 0.50. The probability of an impossible event is 0, and the probability of a certain event is 100 percent or 1.0. Therefore, the probabilities of all events that are neither totally impossible nor absolutely certain are greater than 0 but less than 100 percent (or 1.0). Probability analysis- A technique for forecasting events, such as accidental and business losses, on the assumption that they are governed by an unchanging probability distribution. The sum of the probabilities in a probability distribution must total 1.0 Probability distribution- A presentation (table, chart, or graph) of probability estimates of a particular set of circumstances and of the probability of each possible outcome. probabilities associated with events such as coin tosses can be developed from theoretical considerations and are unchanging. Q) Which one of the following statements is correct with respect to continuous probability distributions? A) One way of presenting a continuous probability distribution is to divide the distribution into a countable number of bins Q) Mehmet is a risk manager who has been asked to conduct a probability analysis of the distribution of claim numbers and severity. Because there are potentially an infinite number of claim values, Mehmet creates ranges of claim severity and records the probability of claims in each range. Which one of the following analyses has Mehmet created? A) Continuous probability distribution

Unfair discrimination

Applying different standards or methods of treatment to insureds who have the same basic characteristics and loss potential, such as charging higher-than-normal rates for an auto insurance applicant based solely on the applicant's race, religion, or ethnic background.

The Evolving Regulatory Response to Modeling-Related Bias

As issues of equity take greater precedence, scrutiny of insurers' practices and pressure for regulators to respond also grow. Keeping abreast of developments in areas like predictive modeling algorithms, artificial intelligence, and machine learning can be a challenge for regulators, but it is essential because laws and regulations in these areas may lose relevance as these technologies advance. However, the NAIC has issued its Principles on Artificial Intelligence, which, along with unfair trade practices acts, are meant to guide insurers in developing models that are fair and ethical; accountable; compliant; transparent; and safe, secure, and robust.

Charlotte is a risk manager who is developing usage-based insurance ratings. Which one of the following data sources is she most likely to use?

Auto telematics Using Vehicle Telematics for Ratemaking- Telematics devices, which collect and then transmit data using wireless communication and global positioning systems (GPS), can provide information that helps underwriters better understand insureds' behaviors and the environments in which they operate. These devices are particularly effective at helping underwriters evaluate the driving habits and risks associated with personal and commercial auto policyholders.

Evaluating the Model

Before a business uses a predictive model, its effectiveness on data that wasn't used to train the model should be evaluated. Evaluation also continues after the training and testing process ends, when a business moves a model to production and can see the model's true effectiveness with new data. ********************************************************************* Putting the Model Into Production- Even after a business puts a predictive model into production, the training process continues. In fact, that is when the model's true value is proved. If a model's predictions don't guide good business decisions, then the model should be reevaluated. Returning to the workplace injury predictive model example, if the manufacturer's risk manager later finds that the model has been flagging the wrong employees as being likely to have an accident, the organization should retrain and reevaluate the model. Risk management and insurance professionals should also defer to their professional experience when examining a model's results. If the results don't seem to make sense, the model was likely overfitted or isn't complex enough. Regardless of initial accuracy, however, no predictive model will make accurate predictions indefinitely. When significant economic, technological, or cultural changes occur, predictive models should be updated and retrained.

Insurers possess customer data that enables them to perform several key types of segmentation:

Behavioristic segmentation- A key benefit of analyzing customers' behavioral patterns is being able to identify common attributes of loyal customers and leverage that information to improve customer retention and satisfaction. EXAMPLE- After browsing for auto policy information on the Great Insurance Company website, Martin begins seeing ads for auto policies in his social media feeds. this involves grouping individuals based on their past behavior, such as purchase history, actions taken on a website, how often customers request a quote, and the benefits they're seeking (lower premiums, managing risks, getting answers to claims questions, and so on). A key benefit of analyzing behavioral patterns is being able to identify common attributes of loyal customers and leverage that information to improve customer retention and satisfaction. Geographic segmentation- This involves dividing markets into geographic units. For example, prospects can be split into groups based on zip code, city, climate, or density (urban, suburban, or rural). The ability to synthesize location-based underwriting and claims data (such as data drawn from vehicle telematics and the Internet of Things) can help insurers further segment markets into increasingly granular customer groups, which can then be paired with more specialized products. Demographic segmentation- This involves carving out groups based on variables such as age, gender, education, occupation, ethnicity, income, and family size. For commercial lines, key variables may also include company size, company type, number of employees, and corporate structure. Demographic data helps insurers identify prospects who are likely to be receptive to certain products and services. Q) Commercial lines insurance markets can be segmented demographically based on A) Type of business

Artificial intelligences is powered by machine learning and deep learning technologies. Automated intelligence, assisted intelligence, augmented intelligence, and autonomous intelligence are four types of AI. Before suggesting a specific type to an insured, an insurer would have to discuss with the insured how much decision-making power they want to trust the technology with. Augment intelligence software is able to learn, so its recommendations improve over time, but it does not make any decisions itself; it helps humans make better decisions

Big data- Sets of data that are too large to be gathered and analyzed by traditional methods. Machine learning- Artificial intelligence in which computers continually teach themselves to make better decisions based on previous results and new data. COMBINATION OF 2 SKILLS. a subset of AI. Chatbot- Software that uses artificial intelligence to engage in dialogue with a human and provide simple responses. Conversational AI- Software that uses artificial intelligence to hold a natural language conversation with a human through a messaging application, text, or website, or over the phone. EXAMPLE- Melanie calls the customer service line for her home internet provider. A chatbot answers the call and is able to respond to several questions and ultimately connect her with a live representative. This is an example of Deep learning- Insights into data use and processing gained by combining artificial intelligence and machine learning. It is based on algorithms derived from artificial neural networks. Neural network- A data analysis technique composed of three layers, including an input layer, a hidden layer with nonlinear functions, and an output layer, that is used for complex problems. predict and prevent machine breakdown Firmware- Software providing basic control for a device's hardware. Q) As a data scientist, Ida must master the necessary skills for her profession: mathematics and statistics, domain knowledge, computer programming, and machine learning. Which one of the following of Ida's skills comes from a combination of two other skills? A) Machine learning Q) Melanie calls the customer service line for her home internet provider. A chatbot answers the call and is able to respond to several questions and ultimately connect her with a live representative. This is an example of A) Conversational AI

More data

Billing Data- The types of billing data an insurer collects may depend on the billing options that the insurer offers and whether it bills customers directly or uses the services of a producer or another intermediary for this function. Options—and thus types of data collected and times of collection—include point of sale, online payments, payments by mail, automatic recurring card payments, and walk-in cash payments. Producer Data- Depending on the extent of their services, producers may handle a wide variety of insured and insurer data. Producers may be the initial contact with the customer and therefore collect and assess the information on the insurance application. Additionally, depending on their role with the insurer, producers may issue policies, collect premiums, provide customer service, handle small or routine claims, and provide varying levels of consulting. These activities create valuable data.

By using sensor technology, GreatAmerica Insurance Co. is able to alert its automotive policyholders via text message to a coming hailstorm, enabling many of them to move their vehicles under cover before the storm. Who among the following benefits from this outreach?

Both the insurer and the insureds

IT capabilities that deliver the right data to the insurer or customer at the right time, in the right place, and in the right form can provide an insurer with a competitive advantage. Such capabilities can increase the operational efficiency of nearly every functional area of an insurer's business, as well as support the insurer's strategy, decision making, and governance and compliance initiatives. In other words, IT intelligence gives insurers better insight for underwriting, product development, and pricing. Operational efficiency can be improved in nearly every functional area of an insurer that harnesses technology correctly. But, as with almost everything, adopting new technology requires an initial cost-benefit analysis.

Business intelligence (BI)- The skills, technologies, applications, and practices used to improve decision-making insights and reinforce information integrity. Internet of Things (IoT)- A network of objects that transmit data to computers. A source of a rapid growing quantity of external data is the Internet of Things. Underwriting- The process of selecting insureds, pricing coverage, determining insurance policy terms and conditions, and then monitoring the underwriting decisions made. Creating Competitive Advantages- In the modern insurance environment, insurer differentiation—and therefore competitive advantage—can hinge on intelligence, customer relationships, and speed. Advances in IT capabilities have created greater opportunities for insurers to quickly collect and store information, as well as access and verify information from outside sources such as motor vehicle registration and licensing records. IT decision-support- Decision-support applications produce results quickly and they incorporate business rules with knowledge that has developed over time. Q) What are some examples of targeted marketing campaigns that insurers can create using geographic segmentation? A) Insurers can use geographic segmentation to create targeted campaigns that, for example, market renters insurance to city residents, earthquake insurance on the West Coast of the United States, and flood insurance in the Southeast.

The second step in the root cause analysis (RCA) process is charting the agents that directly result in one event triggering another event. In RCA, these agents are called

Causal factors Q) Shota is a risk manager who is investigating an industrial injury. The most apparent cause of the injury is lack of attention by the worker, but Shota suspects there are other factors that played a part in the incident. Which one of the following techniques would be most appropriate for Shota to use to identify the causes of the injury? A) Root cause analysis Q) Which one of the following statements about root cause analysis (RCA) is true? A) A root cause must produce effective recommendations for prevention of future accidents

Applying Classification Tree Analysis to Claims

Claims attributes can help develop classification trees, which, in turn, can classify and assign new claims according to target variables such as "complex"/"not complex," "fraud"/"no fraud," and "subrogation"/"no subrogation." By predicting these variables, the insurer can determine the appropriate resources needed to handle each claim as efficiently as possible. Q) In detecting claims fraud with data analysis, an insurer should be prepared to A) Reevaluate the attributes in a predictive model

Traditional data analysis techniques include classification trees, regression analysis, and cluster analysis.

Classification tree- A supervised learning technique that uses a structure similar to a tree to segment data according to known attributes to determine the value of a categorical target variable. The tree would contain nodes; arrows; and leaf nodes, which indicate the values of the target variable. Useful data analysis requires relevant data. For example, a risk manager seeking the likelihood of an employee to be able to return to work after an accident needs data regarding type of occupation currently held, age, qualifications for retraining, and available positions with lighter physical requirements. In this example, a classification tree could be used to segment workers compensation data. The tree would contain nodes; arrows; and leaf nodes, which indicate the values of the target variable.Determine a categorical value for a target variable Regression analysis- A statistical technique that is used to estimate relationships between variables. Is used to determine a numerical value for a target variable. Cluster analysis- A model that determines previously unknown groupings of data. Cluster analysis is a technique for unsupervised learning. It is commonly used when an insurer knows the general problem it wants to solve but not the variables it must analyze to do so. Q) In general, the easiest opportunities to target a market come from A) Using special approaches to market segments Q) Which one of the following explains how classification tree analysis can be used to improve the claims handling process? A) Classification tree analysis can be used to assign new claims according to target variables The information gained through cluster analysis could then be used to develop a predictive model.

Two main types of data model creation are supervised learning and unsupervised learning. Both are used to find patterns in large datasets.

Cluster analysis- A model that determines previously unknown groupings of data. Unsupervised learning- A type of model creation, derived from the field of machine learning, that does not have a defined target variable. K-means- An algorithm in which "k" indicates the number of clusters and "means" represents the clusters' centroids. Centroid- The center of a cluster. Adverse development- Increasing claims costs for which the reserves are inadequate. Long-tail claim- A claim in which there is a duration of more than one year from the date of loss to closure. Generalized linear model (GLM)- A statistical technique that increases the flexibility of a linear model by linking it with a nonlinear function. Target variable- The predefined attribute whose value is being predicted in a data analytical model. Supervised learning- A type of model creation, derived from the field of machine learning, in which the target variable is defined. Classification tree- A supervised learning technique that uses a structure similar to a tree to segment data according to known attributes to determine the value of a categorical target variable. Complex claim- A claim that contains one or more characteristics that cause it to cost more than the average claim. Information gain- A measure of the predictive power of one or more attributes. Machine learning- Artificial intelligence in which computers continually teach themselves to make better decisions based on previous results and new data. Recursively- Successively applying a model. Data mining- The process of extracting hidden patterns from data that is used in a wide range of applications for research and fraud detection. Social media may be useful Nearest neighbor- The most similar instance in a data model.

Artificial intelligence (AI)

Computer processing or output that simulates human reasoning or knowledge. can quickly process large amounts of data—and make accurate decisions based on the results. In this way, it can play an integral role in predicting and preventing potential losses faced by an insured. AI is essentially a computerized simulation of how the human brain processes data. It involves programming machines to analyze big data and make predictions based on a set of rules or complex calculations. AI prevents losses by automating simple, routine tasks after assessing and predicting risk. For example, the ability to reroute shipments to avoid inclement weather or to shut down machinery that is about to overheat can limit claims for commercial insureds. AI prevents losses by automating simple, routine tasks after assessing and predicting risk. For example, the ability to reroute shipments to avoid inclement weather or to shut down machinery that is about to overheat can limit claims for commercial insureds. the two layers that power artificial intelligence are machine learning and deep learning Machine Learning- a subset of AI. It involves computers using algorithms to parse data, gain insights from that data, and then make informed decisions based on what the data reveals. Further, the machines, or computers, continuously analyze new and existing data to improve their decision making.

Computer Vision Without machine learning, finding the segment with the characteristics that best match each policy would be extremely time consuming. When a computer performs the task, the specificity of the attributes defining the segments reduces the need for subjective judgments and facilitates automated underwriting and pricing optimization. Typically, the easiest way for insurers to target products is to develop special sales tactics to sell existing products and services already being offered to a particular market segment. By taking this approach, insurers need not spend a great deal of time or effort creating special insurance products.

Computer vision is a technology that simulates human vision. It involves detecting, extracting, and analyzing images to give that object context and allow a machine to respond to it as a human would. This is accomplished through the development and use of algorithms that can automatically provide visual understanding. Q) What is the benefi t of computer vision technology? A) Computer vision technology allows a computer to glean high-level information from digital videos or images captured by cameras. It's a branch of artifi cial intelligence that permits automation of tasks that normally require human eyesight and decision making Although not a perfect science, computer vision is used in retail operations for automated checkout lanes; medical imaging; automotive safety, particularly related to automated vehicles; surveillance; and traffic control. Blending data science, engineering, statistics, and algorithms, it has become more specialized, accurate, and reliable over time, as related computer technologies have also advanced. One of the early tenets of computer vision was segmentation, or how images are seen and mapped. When combined with deep learning, this mapping of features through an algorithm relates to a commensurate action. EXAMPLE- Parker International recently improved its workplace safety by using facial recognition software at all locations to identify all employees entering the building. This safety measure depends on which one of the following forms of emerging technology? EXAMPLE- Which one of the following terms describes technology that detects, extracts, and analyzes images in order to respond as a human would?

Cross-Validation

Cross-validation is the process of splitting available data into multiple folds, or subsets, and then using different folds for training and holdout testing. The result is that all the data is used in both ways, and the predictive model is developed in several ways, allowing its developers to choose the version they believe performs best. The selected model can then be trained using the entire dataset.

Through data security, an insurer protects its data from corruption and illegal access and use. Data privacy, in comparison, involves defining what data is considered private and considering the risks involved if that data were to be compromised. The EU's GDPR requires businesses to protect the personal data and privacy of EU citizens for transactions that occur within EU member states and regulates the export of personal data outside the EU. U.S. insurers that process personal data of EU citizens must comply with the GDPR.

Data Privacy- Data privacy involves defining what data is considered private and understanding the risks that could compromise that data. There are two types of associated risk: individual risks, which vary according to the type of business and industry, and general risks, which may be categorized as operational or reputational. Data Security- Insurers must implement and enforce policies to ensure that their insureds' information is not shared with others. This can help prevent, for example, a disgruntled employee from taking customer records to a competitor, breaching customer privacy. And without sufficient training, even employees with good intentions could be deceived into releasing private information. General Data Protection Regulation- Nearly every point along the insurance value chain involves customer data, so insurers need to be aware of both their own data management policies and the laws and regulations related to disseminating personal data. One such edict is the European Union's (EU's) General Data Protection Regulation (GDPR). If a U.S. insurer processes personal data of EU residents, it must abide by the GDPR Managing Data Privacy Risks- U.S. insurers that process personal data of EU citizens must comply with the GDPR. They should therefore be familiar with how the GDPR defines the roles responsible for ensuring compliance. For example, the data controller determines how personal data is processed and the purposes for which it is processed and also ensures that outside contractors are in compliance. Other important roles noted in the GDPR include the data processor and data protection officer.

Data Review

Data for use in predictive modeling should be reviewed for reasonableness and consistency. Such reviews take into account whether and to what extent the data has previously been checked, verified, or audited; the purpose and nature of the source data; and any relevant constraints. Through its Actuarial Standard of Practice No. 23, the Actuarial Standards Board defines audits and reviews as: Audit- A formal and systematic examination of data for the purpose of testing its accuracy and completeness. Review- An examination of the obvious characteristics of data to determine if such data appear reasonable and consistent for purposes of the assignment. A review is not as detailed as an audit of data. ***While actuaries are required to review data, they are not required to audit it.*** When an insurer undertakes a large data-related project, a rigorous review of data is both expected and necessary. Working with an actuary, Diane can now use a generalized linear model (GLM) to better project the average ultimate loss value (the target variable) for claims that are not yet closed but share these same attributes. This is a supervised learning technique. The data science team uses a classification tree technique to develop a predictive model based on the attributes of complex claims. With a machine learning algorithm, the computer recursively builds a tree by analyzing different splits in the values of attributes in various sequences.

Data Management Functions An underwriting manager may be concerned primarily with such data as property values, loss ratios, and policy limits, but others may need that same data for accounting and management decisions. Accounting produces financial statements for management, but what about interest rates, market trends, taxation, and regulatory reporting? Owners and executives need much more than a balance sheet and an income statement. A comprehensive and effective data management program, therefore, involves processes to understand, cleanse, integrate, govern, and monitor data as a strategic asset. How can all this be accomplished? By incorporating these basic functions of a data management program into an organization's strategic plan:

Data governance- Provides Definitions, standards and procedures for how data is used. The greatest beneift is transparency. Data governance is the starting point, the set of rules and decisions for managing data. It provides the definitions, standards, and procedures for how data should be used and by whom. Moreover, it defines the controls and audit procedures that ensure policy and regulatory compliance. Data preparation and capture- This includes internal data entry processes to capture accounting transactions, customer data, other operational information, and outside data sources such as research material or financial statistics. Data access- This entails knowing where the data is and how it can be retrieved. Data quality- Data quality involves processes that ensure data is accurate and usable for its intended purpose. Data integration- This may be as simple as financial statements being dynamically updated each time an accounting transaction occurs. Data virtualization is a subset of data integration that allows users to generate a dynamic view of data without moving it or needing temporary or intermediary storage of it.

What's the difference between data mining and data dredging?

Data mining involves analyzing large amounts of data to find relationships and patterns that will help in developing business solutions. Social Media may be useful Data dredging (also referred to as data fishing), on the other hand, is a deliberate search for any relationships between data—even those that are insignificant. Q) Can you think of additional benefits of data management for an insurer? A) A comprehensive data management initiative offers many benefits, including improved productivity from internal resources (who no longer have to manage data in multiple locations) and the ability to revise and enhance data governance.

The Data Scientist

Data scientists must be able to analyze increasingly large amounts of new types of data, including text, geolocation data, sensor data, images, and social network data. Traditionally, actuaries have been the professionals who analyze data and make predictions based on those analyses for insurers. With strong backgrounds in mathematics and statistics, they focus primarily on pricing, ratemaking, and claim reserving. Data scientists, meanwhile, explore previously underused sources of data, such as social networks and new technology. Rather than being concerned directly with pricing and reserving, their work may lead to new insurance products and risk management techniques, as well as the refinement of existing products. There is no clear division between the roles of actuary and data scientist, however. Many actuaries are acquiring advanced computer-programming skills by using new data-analysis programming languages, such as R or Python, to supplement their mathematical and statistical knowledge. What three skill sets does an effective data scientist need- Having a strong base of knowledge in mathematics, statistics, and computer programming isn't all it takes to navigate the field of data science. Data scientists must also have domain knowledge—that is, knowledge of the discipline or activity to which insights from the data will be applied.

Risk Management and Insurance Data Analytics Decision-Making Model Traditionally, actuaries have been the professionals who analyze data and make predictions based on those analyses for insurers. With strong backgrounds in mathematics and statistics, they focus primarily on pricing, ratemaking, and claim reserving. Data scientists, meanwhile, explore previously underused sources of data, such as social networks and new technology. Rather than being concerned directly with pricing and reserving, their work may lead to new insurance products and risk management techniques, as well as the refinement of existing products. There is no clear division between the roles of actuary and data scientist, however. Many actuaries are acquiring advanced computer-programming skills by using new data-analysis programming languages, such as R or Python, to supplement their mathematical and statistical knowledge.

Define risk management or insurance problem Data >>>Analysis and Modeling >>> Insights >>> Actionable Decisions Insights give feedback to Data There are two basic approaches to data-driven decision making: 1) descriptive- The descriptive approach is helpful when an insurer or risk manager has a specific problem that can be solved through data science. Once that problem is solved, the exact analysis is typically no longer needed. For example, let's say that an insurer changed its underwriting guidelines for auto insurance, reducing the accident-free time period required for coverage from five to three years. In this scenario, the insurer may use a descriptive approach to decision making the following year to determine whether this was a sound business decision. 2) predictive- The predictive approach to data analytics creates a reusable method of providing information for data-driven decision making by humans, computers, or both. For example, automated underwriting for personal auto insurance is a predictive approach that's used each time a person applies for insurance. The computer makes the underwriting decision and issues a price quote. Q) For which one of the following is it difficult to rely solely on traditional underwriting guidelines? A) Newly introduced products and emerging technologies

Potentially Unfair Modeling Variables

Despite the adoption of unfair trade practices acts and other regulatory measures designed to ensure fairness, some of the ratemaking variables used in predictive models may nonetheless result in biased decision making—and as insurers' models become more complex, it can become more difficult to determine how the model came to its conclusions. For example, many insurers use credit scores as the basis for credit-based insurance scores, which are then used in underwriting selection and pricing when reviewing applications for coverage. Several credit report characteristics factor into an insurance score, such as age of the oldest account, number of inquiries in 24 months, and ratio of total balance to total limits. Unlike credit scores, insurance scores are not intended to measure creditworthiness, but rather to predict the probability of a loss, and numerous studies have established the correlation between credit scores and loss potential. However, these scores don't account for the circumstances that may have caused someone's score to decline. An insured's score could be harmed by circumstances such as identity theft or predatory lending, or even a simple billing error. In addition, these scores can reflect racial bias and other unfairly discriminatory practices. When that's the case, large-scale use of credit scores in data models can then perpetuate these biases. For some lines of business, particularly products liability coverages for new products or emerging technologies, it's difficult for insurers to accurately assess loss exposures based solely on traditional underwriting guidelines. That's because if an insurer has little to no experience with a particular exposure, it may be unable to glean enough insights from its underwriting data to build a predictive model that helps determine the level of risk.

Differing theories of accident causation, including the domino theory and energy transfer theory, suggest distinct ways of removing those accident causes. The choice of a specific accident analysis technique, including job safety analysis and root cause analysis, depends, in part, on the assumptions made about how accidents are caused, prevented, or made less severe.

Domino Theory- The domino theory, sometimes referred to as the sequence of events theory, proposes that five accident factors can form a chain of events that lead in succession to a resulting accident and injury. Because each of the first four links of the domino theory leads directly to the next, removing any of them should, in theory, prevent the resulting injury from occurring. Removal of the third domino, the unsafe act and/or mechanical or physical hazard, is usually the best way to break the accident sequence and prevent injury or illness. Considering its emphasis on human fault, the domino theory is best applied to situations within human control. Although people can attempt to protect themselves and their property from a natural disaster such as a hurricane, preventing the cause of the accident is not within their control. Energy Transfer Theory- An approach to accident causation that views accidents as energy that is released and that affects objects, including living things, in amounts or at rates that the objects cannot tolerate. The energy transfer theory posits that the basic cause of accidents is energy out of control. Its approach to preventing accidents or reducing the resulting damage focuses on controlling released energy and/or reducing the harm caused by that energy. Basic strategies of the energy transfer theory deal with maintaining safe distances between objects that may move at great speed, such as separating pedestrians and motor vehicles, and ensuring that objects that have the potential to move at great speeds also have the ability to slow themselves down or stop themselves completely, such as equipping elevators with emergency brakes or installing physical barriers within buildings to prevent the spread of fires or floods.

Risk Analysis Q) How do mobile apps help level the playing field for smaller insurers looking to compete with bigger, more traditional insurers? A) A small insurer's mobile app takes up as much real estate on a customer's device as a large insurer's app. It can also offer the same conveniences and highly targeted products and services. For example, a small insurer can offer access to instant quotes, digital transactions, claims services, and customer support the same way a large insurer can. The key for smaller insurers is enticing customers to download and use their apps.

Empowered by rapidly evolving technology that can gather and process risk-related data in greater volumes and at greater speed than humanly possible, risk analysis provides the foundation for the "predict and prevent" philosophy underlying insurance products designed to prevent losses rather than just provide compensation for them. The risks analyzed as part of a "predict and prevent" strategy could pertain to a specific event, product line, project, or process. This strategy involves identifying the range of possible consequences that could result from the risk and determining the likelihood of their occurrence. The consequences of the risks relate to how each one might affect the insured's ability to accomplish its objectives. If an identified consequence is insignificant or very unlikely to occur, efforts to predict or prevent it may not be a worthwhile investment for the insurer. In contrast, if a single event could trigger multiple far-reaching consequences that affect many organizational objectives, insurers will likely spend significant time and resources to prevent them. For more complex consequences, several methods of analysis may be required to determine the level of risk involved.

Big Data Categories and Examples Insurer decision making is inherently discriminatory in the strictest sense of the word: It entails separating acceptable risks from unacceptable ones and charging higher premiums for greater perceived risks. But it must not be unfairly discriminatory. In other words, it can't violate antidiscrimination laws by explicitly using as the basis for decisions identity-related factors that have been deemed protected. However, ambiguity can be introduced through hidden relationships in the data. For example, actuarially fair and accurate loss data may support assigning a certain level of risk to drivers who live in a particular zip code, but this kind of socioeconomic data may be also be tied to race or another protected characteristic and therefore be considered morally questionable.

External Structured- Telematics, Financial Data, Labor Statistics External Unstructured- Social Media, News Reports, Internet videos Internal Structured- Policy Information, Claims history, Customer data Internal Unstructured- Adjuster notes, Customer voice records, surveillance videos

Fatima is a data scientist for an insurer. While analyzing claims data, she finds that insureds with a specific make of automobile have a higher incidence of theft than other insureds. Fatima would describe the relationship between automobile make and theft losses as

High gain, low entropy

Explainability in Insurer Data Modeling

In addition to fair and accurate, insurers' predictive models must also be readily explainable. Explainability follows if a model's predictions can be easily interpreted and the modeling system can be easily repeated to produce identical results. ***Can be repeated to produce identical results*** The Oxford-Munich Code of Conduct helps professionals working with data navigate ethical quandaries. It states that, in some instances, accuracy and explainability can be competing interests, causing the modeler to prioritize one over the other based on how the model and its results will be used. In addition to customers, regulators demand transparency, further creating a need for insurers to disclose and justify the rationale behind premium increases. **Based on how the model and its results will be used**

Leverage

In model performance evaluation, the percentage of positive predictions made by the model minus the percentage of positive predictions that would be made in the absence of the model. an alternative measure of a model's accuracy, examines the difference between the two outcomes. For this example, leverage would be calculated in this way: 0.50 - 0.20 = 0.30.

Unfair Trade Practices Act

In the United States, insurance regulators in each state have adopted some form of an unfair trade practices act meant to protect both the insurance industry and the consumer. Most of these laws, based on the National Association of Insurance Commissioners' (NAIC's) model Unfair Trade Practices Act, prohibit unfair methods of competition, deceptive actions and practices, and underwriting practices that would result in unfair discrimination. Consumers can submit complaints about an insurer's unfair trade practices to the department of insurance (DOI) of the state where the activity in question occurred. If the DOI finds a complaint valid, the insurance commissioner may issue a cease-and-desist order barring the insurer from continuing the activity. The DOI may then hold a hearing, and if the insurer is successful in its defense, the DOI will remove the cease-and-desist order. If the DOI finds that the insurer violated the law, it may impose one or both of two types of penalty: Fine per violation- Higher fines are imposed for activities conducted flagrantly and with conscious disregard of the law. If an insurance company has violated the law regarding unfair trade practices, the two punishments are fine per violation, licencse suspension or revocation License suspension or revocation- This penalty may be imposed if the insurer's management knew or should have known that the activity was an unfair trade practice. Q) Insurance Company must abide by the unfair trade practices acts of the many states in which it does business. Which one of the following would be most likely to constitute an unfair trade practice? A) Issuing a policy with rates that are not approved by the insurance department in that state

Risk manager Miguel uses telematics to help his employer predict and prevent losses. He is implementing a proactive risk management program to detect leaks and provide early warnings. Miguel is most likely to use

IoT water sensors

Continued

Job Safety Analysis- An analysis that dissects a repetitive task, whether performed by a person or machine, to determine potential hazards if each action is not performed. Job safety analysis (JSA) is one of the most universally applicable and versatile techniques for analyzing the cause of accidents. It involves breaking down each activity or operation into individual sequential steps, identifying hazards associated with each step, defining controls, and assigning responsibility for the implementation of each step (provided that the added benefit of the safety procedure outweighs its costs). JSA applies best to repetitive human tasks performed in an environment sufficiently stable to allow most hazards to be predicted. These can include positions along an assembly line or in an office where employees work at their desks most of the day. Repetitive tasks and person/machine systems are so common that JSA is applicable in almost every case in which a person must act safely to prevent bodily injury or property damage. Root Cause Analysis- Accidents are often the result of a series of events or decisions. However, many of these can go unrecognized because the most obvious event or decision in the process is often declared the sole cause. Root cause analysis (RCA) is a process that enables the risk management professional to dig past the obvious causes of an accident to find other factors that played a role. After identifying all causal factors, the root causes are investigated. Each causal factor is inserted into a root cause map to determine its root cause. Q) A systematic procedure that uses the results of other analysis techniques to identify the predominant determinants of an accident is called A) Root cause analysis

Characteristics of Quality Data

Let's examine some factors that determine a dataset's quality—and therefore its applicability to a given task: Validity- Refers to relevance or suitability for a particular application; the same data can be valid in one analysis but not in another. In the context of our wildfire example, a dataset that included auto claims might not be considered valid, as it probably doesn't have much predictive value for homeowners claims. Accuracy- Which one of the following categories of data quality measures how well data represents true values and the business information being analyzed? Completeness- Also called comprehensiveness, this is measured by whether the dataset delivers all of the variables that it purports to. For example, a dataset meant to represent every homeowners claim in rural New York for a particular year would be incomplete if it excluded results for February. Reasonability- Refers to the data's materiality, taking into consideration applicable business conditions and whether, at a basic level, the data makes sense. If the injured parties in a set of auto accident claims, for instance, all appear to be over 100 years old, the data may not be reasonable. Timeliness-In analyzing claims data for the purpose of rate-setting for homeowners policies, Glenn discovers that much of the data in the sample is more than ten years old Lineage- tracing data from its source to its destination in an effort to explain unexpectedly inconsistent or inaccurate outcomes Consistency- Measures the extent to which datasets stored in multiple locations match one another. Consistent data remains stable as it moves from one application to another, for example. Data consistency is crucial to the maintenance of backup and redundant files.

DSS (decision support system)

Models information to support managers and business professionals during the decision-making process Q) Greater American Insurance Company uses applications that analyze large amounts of data to supply its managers with suggested courses of action in dealing with business problems. Which one of the following acronyms describes these applications? A) DSS Decision-support systems (DSS) analyze large databases of business intelligence to supply managers with suggested courses of action (outputs) for business problems. These outputs are limited to ones that are best supported by the data. Although managers must still exercise good judgment to select the best solutions, the DSS can help improve the speed and quality of their decisions. Also, a manager can add to or change details of a problem and then rerun the query to produce more refined solutions. In addition to the speed with which DSS applications produce results, they easily transfer knowledge that has been gathered over time to new managers, resulting in more consistent, rational decision making.

Insurers can use predictive modeling to predict events and behaviors. Two concepts that drive predictive modeling are similarity and distance, which can be measured by nearest neighbors and link prediction. Information from social networks can reveal insight into customers' purchases, health, and risky behaviors. Centrality measures can also be used to examine connections. For example, how many friends a person has on a particular social media platform can be used to determine someone's degree of centrality to others. Closeness is a measure of the distance from these people (friends) to the central person—essentially, the similarity between them—and therefore how quickly information will travel between them.

Nearest neighbor- The most similar instance in a data model. Data points closest to each other. k nearest neighbor (k-NN)- An algorithm in which "k" equals the number of nearest neighbors plotted on a graph. Class label- The value of the target variable in a model. Link prediction- A prediction of the connection between data items. Link function- A mathematical function that describes how the random values of a target variable depend on the mean value generated by a linear combination of the explanatory variables (attributes). Q) Which one of the following is an algorithm used to group data into clusters of claims? A) K-means K-means cluster- The data scientist will most likely recommend using an unsupervised learning technique such as k-means cluster analysis because Emillia does not know the reason for the unusual development Betweenness measures the extent to which a person connects others. For example, Susan could be connected to Chris because they are both friends with Maria on a social media platform. Information that Susan posts on that platform could travel to Chris through Maria. If Maria connects many people in this way, she would be considered to have a high degree of betweenness.

Text mining

Obtaining information through language recognition. Much of the information used for data mining arrives as text. For insurers to turn that text into valuable insights, they must understand and employ text mining. Q) Durham Insurance has a special team working on identifying fraudulent auto liability claims. The team is using a claims analysis technique that scans claims file data for terms or names that repeatedly appear in the files of fraudulent claims. Which one of the following data analytics techniques is the team using? A) Text mining Software can clean data to check for errors, inconsistencies, and duplications and detect personally identifiable information (PII) embedded in data and remove it. The kind and quality of PII and medical information requested, collected, and maintained should be limited to what's necessary and pertinent to the claim, injury, condition, or situation.

Which one of the following steps undertaken by an analyst during a data review can be particularly helpful in detecting data anomalies?

Perform exploratory analysis

The Insurance Value Chain

Primary Activities- Marketing and Distribution, Underwriting, and Claims Support Activities- Legal and compliance, customer service, actuarial, reinsurance, premium auditing, human resources, special investigation units, information technology, accounting and finance, investments, risk control Q) What are some ways predictive modeling based on data-driven decision making benefits an insurer? A) Predictive modeling improves underwriting accuracy, pricing precision, claims processing, fraud detection, and a variety of other business activities. Results produced through data science are useful only if they are relevant to the business context. For example, unless insurers gain knowledge that helps them compete more effectively, they aren't receiving any benefits from data science. Similarly, to make data science a worthwhile endeavor for risk managers, they must be able to realize the benefits of employing new technologies, such as wearable devices, to improve safety or obtain insights about risk that will help them manage it more efficiently.

Construction and Engineering Management- Smart products also have many applications in construction and engineering: Motion sensors can be used for surveillance and security. Pressure sensors convert pressure or tension into a measurement of electrical resistance. Current sensors are used to protect electronic systems and batteries from heat buildup. Position sensors are used to activate components only when they are in the optimal location for a particular process to continue. Proximity sensors, slightly different from position sensors, respond when an object reaches an area within range of the sensor. For example, proximity sensors in wearables can detect when a person has entered a hazardous area and warn the person or a manager.

Property Management- Property managers can use wireless sensor networks (WSNs) to detect and respond to leaks and malfunctions or prevent on-site falls and injuries. Supply Chain Management- Supply chain managers need to worry not only about the risk of product, service, or shipment disruptions to their own organizations but also about the downstream effects that interruptions cause to products, services, or shipments to customers. Many emerging technologies assist in predicting and controlling or preventing such losses. Transportation Management- Transportation managers incorporate technologies from the Internet of Things (IoT) to connect vehicles and their drivers with solutions for safety, efficiency, and reliability. Catastrophe Management- Sensors and WSNs are also used in catastrophe management. As long as a sensor can withstand a harsh environment, it can monitor the area for light, temperature, gases, precipitation, wind speed, water level, and more. Workplace Safety Management- Safety managers have a vast array of smart products at their disposal to improve workplace safety and productivity. Thanks to wearables, for example, sensors can be incorporated into safety vests or other gear, leaving workers' hands free to do their jobs. The customer service and satisfaction benefits of mobile technology are enormous. Some insurers use mobile apps to reduce the need for support and customer service staff, thereby lowering their overhead and, in turn, costs for consumers.

Q) First Rate Insurer wishes to learn whether homeowners policyholders under the age of 40 are more likely to experience fire losses in their homes. What type of model creation will First Rate use? A) First Rate will employ supervised learning, a type of data model creation in which the goal of the data mining task (the target) is known. Unsupervised learning is conducted when there is no defined target. Both types are used to find patterns in large datasets.

Q) Claims at First Rate have increased over the past six months. The increase has occurred across several different regions, and the claims manager wants to know whether a common factor is the cause. Based on the information the insurer currently has, which data modeling concept should the insurer use? A) At this point, with no specific target, a descriptive model would be most appropriate for studying the data surrounding the claims and looking for any patterns in it. If there is a high number of property damage claims among, for example, policyholders filing for divorce, First Rate could use a predictive model with recent divorce filings as an attribute and the number of property claims as the target variable. A lift curve could then be used to show whether the predictive model is effective.

The Scientific Method Scientific Method Steps

Question, Research, Hypothesis, Experiment, Analyze the Data, Conclusion

In detecting claims fraud with data analysis, an insurer should be prepared to

Reevaluate the attributes in a predictive model ********************************************************************* Specialized sensors include transducers, actuators, and accelerometers. These and other types of sensors are widely used in factories and many industries (such as construction, medicine, retail, and transportation). One of the early tenets of computer vision was segmentation, or how images are seen and mapped. When combined with deep learning, this mapping of features through an algorithm relates to a commensurate action. Q) How can computer vision improve upon the ability for a single human to visually monitor risk factors? A) As opposed to a single human with one pair of eyes, many different cameras (stationary or video), looking at one fixed point or panning a certain area, can provide a more comprehensive and integrated view over time. For more complex consequences, several methods of analysis may be required to determine the level of risk involved. Q) While there can be an unlimited number of specific causes of accidents, they can generally be sorted into three basic categories. What do you think those categories of accident causes are? A) The three basic categories of accident causes are poor management, safety policy, and personal or environmental factors.

The Data Science Team

Results produced through data science are useful only if they are relevant to the business context. For example, unless insurers gain knowledge that helps them compete more effectively, they aren't receiving any benefits from data science. Similarly, to make data science a worthwhile endeavor for risk managers, they must be able to realize the benefits of employing new technologies, such as wearable devices, to improve safety or obtain insights about risk that will help them manage it more efficiently. Risk management and insurance professionals can be valuable members of data science teams. They help provide the context for the goals of data mining projects and how results can be applied to generate business solutions. For example, underwriters may inform the insurer that it's losing commercial business to competitors. The data science team may then design a data mining project to try to determine the reasons. Likewise, claims professionals may find that use of opioid medications is rising, and the data science team may be able to perform data mining to analyze the characteristics of medical providers and claimants involved in excessive opioid use. And risk managers may have information regarding an increase in repetitive motion injuries; data mining may provide more detailed information on how and where those injuries are occurring, and that information may lead to a solution, such as wearable safety devices.

After one employee was involved in several workplace accidents, the risk manager and the industrial safety engineer concluded that these resulted from ancestry and social environment of the person. Which one of the following accident causation theories or approaches considers this an accident factor?

Sequence of events theory

Big data

Sets of data that are too large to be gathered and analyzed by traditional methods. Q) Can you identify some privacy issues related to insurers' use of data? A) Privacy issues around obtaining mass information from public sources, including social media, may present legal and regulatory concerns. This is particularly true with data science projects used to discover and analyze previously unknown relationships between different data points. For example, an insurer might access individual items of data about a claimant from both internal and external sources. While the data from each source by itself may not violate the claimant's privacy rights, combining all the data could lead to conclusions that harm the claimant in a way that wasn't anticipated. As a result, privacy regulations have been passed throughout the United States to restrict various organizations' access to and use of certain types of personal information. Internationally, the European Union's (EU's) General Data Protection Regulation (GDPR) provides consumers with strict data privacy rights surrounding data consent, data portability, and breach notifications. United States-based organizations may have to comply with the GDPR if they do business with individuals in the EU. Additionally, insurers must be cautious when using voice analysis. While certain vocal indicators can suggest dishonesty, they may be the result of stress or anxiety.

In data modeling, how is similarity measured? Traditional data analysis techniques include classification trees, regression analysis, and cluster analysis. They are usually applied to structured data.

Similarity/how similar they are is measured as the distance between two instances' data points. A small distance indicates a high degree of similarity, and a large distance indicates a low degree of similarity. Measuring Similarity and Distance Similarity is the measure of how alike two data objects are. It can be extremely useful when looking for the relationships between data points (instances). For example, if an insurer can find past (closed) claims with similarities to new claims, it may be able to predict settlement amounts, defense costs, and the likelihood of fraud. Once the value of the attributes that make up an instance have been determined, a data point that represents the value of the instance's target variable is plotted on a graph. The distance between two or more instances' data points can then be measured to determine how similar they are. For the distance between two points to accurately depict their similarity, some adjustments may be required so that all relevant attributes do so proportionally. This ensures that no attribute influences the distance more than it should.

Smart products can be used to predict risks and prevent losses in numerous areas, including property management, supply chain management, transportation management, catastrophe management, workplace safety management, and construction and engineering management. Before implementing any rate increases, Three Hills decides to use machine learning techniques to analyze segments of its homeowners customers, analyze the components of its loss ratios, improve underwriting guidelines, and more closely match premiums to loss exposures.

Smart product- An innovative item that uses sensors; wireless sensor networks; and data collection, transmission, and analysis to further enable the item to be faster, more useful, or otherwise improved. Big data- Sets of data that are too large to be gathered and analyzed by traditional methods. Wireless sensor network (WSN)- A wireless network consisting of individual sensors placed at various locations to exchange data. Radio frequency identification tags Radio frequency identification (RFID)- A technology that uses radio frequency to identify objects. Radio frequency identification (RFID) tag- A transponder that communicates with an antenna and transceiver (together called the reader) using radio frequency identification. Which one of the following forms of emerging technology can Liam use to identify assets and their characteristics (such as quantity, location, and expiration date) in real time and without human intervention? Internet of Things (IoT)- A network of objects that transmit data to and from each other without human interaction. A source of a rapid growing quantity of external data is the Internet of Things. Lidar- A sensor similar to radar that uses infrared light to detect nearby objects. Smart transportation- The integration of strategic vehicle management solutions with innovative technologies. Q) Jacqueline and her team manage the inventory for a large shipping organization. To date, they have used barcodes to track and manage individual assets. Which one of the following explains why radio frequency identification (RFID) tags might be a better choice? A) RFID tracking happens in real time

Construction and Engineering Management

Smart products also have many applications in construction and engineering: Motion sensors can be used for surveillance and security. Pressure sensors convert pressure or tension into a measurement of electrical resistance. Current sensors are used to protect electronic systems and batteries from heat buildup. Position sensors are used to activate components only when they are in the optimal location for a particular process to continue. Proximity sensors, slightly different from position sensors, respond when an object reaches an area within range of the sensor. For example, proximity sensors in wearables can detect when a person has entered a hazardous area and warn the person or a manager.

Sensor-Equipped Products

Smartphone- Speed, Location, Direction, noise, temperature, proximity to hazards Smartwatches- Heart Rate, Blood Oxygen, noise location, direction, g-forces Wristbands, shirts, skin patches- Respiration, heart rate, energy output, body temperature, hydration Footwear- Movement, location Headgear- Location, direction, g-forces, brainwave activity, fatigue level Vests- Motion, Location At times, information harvested from sensors can be useful in isolation. For example, if an employee's wearable indicates that the employee is fatigued, a supervisor can remove the employee from the job to reduce the chances that the employee will be involved in an accident. However, such data is most useful when combined with other data points to provide a holistic view of a situation, such as how environmental conditions, worker attributes, and equipment performance converge to affect the probability of an accident. The data generated from sensors such as wearables and telematics devices can be collected, aggregated, and analyzed to determine the probability of an accident. From there, insurance and risk management professionals can work to prevent accidents and mitigate the negative effects if one occurs.

The two main tent poles of ethical models are fairness and transparency. Q) What makes an insurer predictive model fair and transparent? A) A fair model allows an insurer to make risk and pricing decisions based on data that's both accurate and truly predictive of the expected future cost of coverage. A transparent model allows an insurer to demonstrate the rationale for risk-related decisions and illustrates how consumer data is collected and used. It counters the perception of insurers' algorithms making decisions based on results from so-called black boxes that generate mysterious calculations from consumer data.

Structured data- Data organized into databases with defined fields, including links between databases. Unstructured data- Data that is not organized into predetermined formats, such as databases, and often consists of text, images, or other nontraditional media. Premium- The price of the insurance coverage provided for a specified period. Loss adjustment expense (LAE)- The expense that an insurer incurs to investigate, defend, and settle claims according to the terms specified in the insurance policy. Allocated loss adjustment expense (ALAE)- The expense an insurer incurs to investigate, defend, and settle claims that are associated with a specific claim. Unallocated loss adjustment expense (ULAE)- Loss adjustment expense that cannot be readily associated with a specific claim. Salvage- The process by which an insurer takes possession of damaged property for which it has paid a total loss and recovers a portion of the loss payment by selling the damaged property. Subrogation- The process by which an insurer can, after it has paid a loss under the policy, recover the amount paid from any party (other than the insured) who caused the loss or is otherwise legally liable for the loss. Point of sale- A seamless transaction, documentation, and payment at the time and place of a sale.

Traditional data analysis techniques remain crucial to solving business problems, but also form the foundation of newer methods. Such techniques include classification trees, various types of statistical regression models, and cluster analysis.

Structured data-Data organized into databases with defined fields, including links between databases. Node- A representation of a data attribute. Arrow- A pathway in a classification tree. Leaf node- A terminal node of a classification tree that is used to classify an instance based on its attributes. Target variable- The predefined attribute whose value is being predicted in a data analytical model. Linear regression- A statistical method to predict the numerical value of a target variable based on the values of explanatory variables. needs to determine the relationship between attributes and a target variable. Generalized linear model (GLM)- A statistical technique that increases the flexibility of a linear model by linking it with a nonlinear function. Q) A data analytics technique that uses supervised learning and defined target variables to segment data according to known attributes and determine the probability of a workplace accident is A) A classification tree Q) Alva is a data analyst for an insurer and uses techniques such as classification trees, linear regression, cluster analysis, and linear models. When Alva needs to determine the relationship between attributes and a target variable, she creates an algorithm using A) Linear regression

Insurance professionals should be familiar with basic modeling concepts, such as supervised and unsupervised learning, predictive and descriptive modeling, algorithms, entropy, and lift. Useful data analysis requires relevant data. For example, a risk manager seeking the likelihood of an employee to be able to return to work after an accident needs data regarding type of occupation currently held, age, qualifications for retraining, and available positions with lighter physical requirements.

Supervised learning- A type of model creation, derived from the field of machine learning, in which the target variable is defined. A classification tree is a form of supervised learning Unsupervised learning- A type of model creation, derived from the field of machine learning, that does not have a defined target variable. Predictive model- A model used to predict an unknown outcome by means of a defined target variable. Descriptive model- A model used to study and find relationships within data. Attribute- A variable that describes a characteristic of an instance within a model. Instance (example)- The representation of a data point described by a set of attributes within a model's dataset. Target variable- The predefined attribute whose value is being predicted in a data analytical model. Information gain- A measure of the predictive power of one or more attributes. Entropy- A measure of disorder in a dataset. Lift- In model performance evaluation, the percentage of positive predictions made by the model divided by the percentage of positive predictions that would be made in the absence of the model. Q) Glenda, a claims manager, is tasked with using classification tree analysis to help with earlier prediction of claims that might turn out to be fraudulent. She analyzes a series of claims attributes and ranks them according to their relevance in predicting fraud. She is ranking the attributes according to which one of the following factors? A) Information gain An Important step after identifiying the attributes of complex claims is ranking the attributes according to their relative information gain

These are four fundamental concepts of data science: 4 fundamental

Systematic processes can be used to discover useful knowledge from data Information technology can be applied to big data to reveal the characteristics of groups of people or events of interest Analyzing data too closely can result in interesting findings that are not generally useful Data mining approaches and results must be thoughtfully considered according to how the results will be applied Q) Besides sex, marital status, race, religion, national origin, and credit reports, what are some forms of data that consumers often deem unfair for use in risk selection and pricing? A) Some controversial forms of data include zip code; biometric and genetic information; purchase histories; telematics; online activities; social media posts; activities tracked by wearables; and information on injuries, disabilities, and medical conditions.

The growing amount of data and analytics resources allows insurers to fine-tune their marketing strategies—particularly in the areas of market segmentation and product targeting. Insurance leaders who understand how data can inform these activities can drastically improve marketing ROI.

Telematics- The use of technological devices to transmit data via wireless communication and GPS tracking. Internet of Things (IoT)- A network of objects that transmit data to and from each other without human interaction. A source of a rapid growing quantity of external data is the Internet of Things. Affinity marketing- A type of group marketing that targets various groups based on profession, association, interests, hobbies, and attitudes. Q) Which one of the following best describes a successful approach to product targeting? A) Combining special insurance products with a tailored approach to specific segment needs Often, demographic segmentation is combined with other types of segmentation to create even smaller subgroups. For example, demographic variables can be combined with geographic data to identify marketing subgroups in a technique called geodemographic segmentation.

Data collected from wearables and other sensors can help create predictive models that analyze the relationships among myriad variables to improve the accuracy of predictions about accidents and identify the best ways to prevent them.

The Effect of Advanced Data Analytics- The data generated from sensors such as wearables and telematics devices can be collected, aggregated, and analyzed to determine the probability of an accident. From there, insurance and risk management professionals can work to prevent accidents and mitigate the negative effects if one occurs. The analysis in the example above illustrates how to use various combinations of attributes to predict the probability of an accident. Understanding how various biological traits, environmental factors, equipment failures, and other variables affect the probability of an accident helps insurers and organizations make risk management decisions regarding training regimens, labor divisions, and so forth to reduce accident probabilities.

Causal factors

The agents that directly result in one event causing another.

Data mining

The analysis of large amounts of data to find new relationships and patterns that will assist in developing business solutions. Ensuring fairness in predictive model used by insures requires that the models be based on Data Mining Q) Tania is a workers compensation claims manager at Millstone Insurance. She has noticed that the use of opioid medications is increasing significantly. Which one of the following data analytics approaches could be used to analyze the characteristics of medical providers and claimants involved in excessive opioid use? A) Data mining Q) David often uses data mining to explore the relationships between data types in his role as an actuary. After he selects data to be mined, it must then be A) Cleaned ********************************************************************* Claims data is collected when claims are opened, changes or payments are made, and claims are settled. Sometimes, it's also collected when claims are reopened or closing. Claims recovery data includes recoveries for salvage and subrogation. Claims recoveries are recorded as negative amounts in most claims databases. Other information is often contained in claims data, including injury type, body part affected, cause of loss, attorney involvement, whether an independent medical exam has been requested or conducted, and whether the claim was referred to a special investigation unit.

There are four general types of AI 4 types

The four main types of AI are automated intelligence- Automated intelligence uses rules-based software to complete repetitive tasks that don't require human involvement. Because automated intelligence systems aren't adaptive, they don't learn from various types of decisions and don't look for ways to assist others in decision making. They're suited for well-defined tasks that don't require judgment, empathy, or complex calculations. A chatbot that can answer simple questions or resolve basic customer queries is an example of an automated intelligence application. assisted intelligence- Assisted intelligence supports the work being done by humans. With assisted intelligence, a human makes the decisions, but the AI provides data to support the decision-making process or executes the final action determined by the human. While the data it provides may help humans make better or faster decisions, the AI isn't designed to improve the way the machine makes decisions. augmented intelligence- Augmented intelligence takes things a step further than assisted intelligence to work collaboratively with humans to perform tasks or make informed decisions. Augmented intelligence uses advanced analytics to provide deep insights into data and make precise recommendations. However, final decision-making power still rests with humans. Augmented intelligence is also adaptive, meaning it continuously learns and improves. autonomous intelligence- Autonomous intelligence systems use machines that act on their own to complete tasks and make decisions without any involvement from humans. These systems also learn from previous decisions. Very few true autonomous intelligence systems are in use today, and widespread adoption appears to be years away.

Sebastian is a corporate risk manager who is developing a model to predict insureds who are more likely to make fraudulent claims. He has identified attributes that contribute to fraud and had been testing the model using holdout data. Historical data indicates that 5% of insureds file fraudulent claims, but the model predicts that 90% of insureds are likely to commit fraud. Sebastian concludes that

The model had overfit the data

One of the ways in which probabilities can be developed is theoretically. Which one of the following is an example of an event for which probability can be determined theoretically?

The number of times heads can be expected to turn up over multiple coin tosses

Telematics

The use of technological devices in vehicles with wireless communication and GPS tracking that transmit data to businesses or government agencies; some return information for the driver.

Theoretical and Empirical Probabilities Empirical probabilities are only estimates. And to be accurate, the samples under study must be sufficiently large and representative. In contrast, theoretical probabilities are constant as long as the physical conditions that generate them (such as how many sides a die has) remain unchanged.

Theoretical probability- Probability that is based on theoretical principles rather than on actual experience. Empirical probability (a posteriori probability)- A probability measure that is based on actual experience through historical data or from the observation of facts. Probabilities can be developed from either theoretical data distributions or historical data. Theoretical probability is unchanging. It's commonly associated with events such as coin tosses or dice throws. For example, from a description of a regular coin or die, a person who has never seen either can calculate the probability of flipping a heads or rolling a four. All the information you need to determine a theoretical probability is right in front of you; you don't have to conduct an experiment, such as rolling dice to see how many times you get a certain number. ****Estimates whose accuracy depends on the size and representative nature of the samples being studied are Empirical probabilities**** Empirical probability (also referred to as experimental probability) is associated with historical data. For example, the probability that a male will die at age 68 is an empirical probability because it's estimated by studying male mortality data. These probabilities may change as new data is discovered or the environment that produces those events changes. Empirical probabilities are only estimates. And to be accurate, the samples under study must be sufficiently large and representative. In contrast, theoretical probabilities are constant as long as the physical conditions that generate them (such as how many sides a die has) remain unchanged. empirical probability distributions- They provide a mutually exclusive, collectively exhaustive list of outcomes

Market Segmentation

Through effective market segmentation, paired with target marketing and product targeting, an insurer improves its marketing return on investment (ROI). It does this by creating messaging and products optimized for—and targeted to—the audience most likely to be receptive to them. Product Creation and Targeting- Product targeting plays a key part in improving an insurer's marketing ROI. It often involves creating or seeking out a special insurance product to meet the specific needs of a target market. For example, some insurers are constantly designing new insurance products for niche market segments. Animal mortality coverage (a type of livestock insurance) and animal embryo coverage (insuring embryos during gestation or artificial transfer) are two examples of specialized coverages for niche markets. Insurers often have separate underwriting departments devoted to such business to expand their overall books of business. Some insurers may engage in affinity marketing to sell insurance to or through groups with similar interests or needs. Q) Producers frequently focus their marketing efforts on a specific group of consumers. This approach is known as A) Target marketing

Which one of the following explains why a computer recursively applies a model?

To analyze different splits in the values of attributes ******************************************************************** An insurer's governance, risk, and compliance programs create rules, processes, and controls that support the insurer's operating policies and strategic goals. Q) What are the benefits of these programs? A) They create transparency that offers the insurer's management and stakeholders a macro view of all the organization's daily activities and helps them identify potential credit, market, or operational risk exposures so that they can react quickly and appropriately.

Training and evaluating a predictive model involves separating data into training and holdout data, training the model, evaluating the model through various performance metrics, retraining it as needed, and then moving the model into production. Risk management and insurance professionals should be aware of the limitations of models, which can be too complex or not complex enough, and of the need to reevaluate them frequently. Q) Charles Construction is developing a predictive model for long-term disability injuries. It has collected data on 500 injured employees with certain attributes. It will use the data on 400 of those employees to train the predictive model. The data on the remaining 100 employees will be used later in the process to make sure that the model performs well on data that was not used in its development. The data on the remaining 100 employees is known as A) Holdout data

Training data- Data that is used to train a predictive model and that therefore must have known values for the target variable of the model. Target variable- The predefined attribute whose value is being predicted in a data analytical model. Class label- The value of the target variable in a model. Instance (example)- The representation of a data point described by a set of attributes within a model's dataset. Attribute- A variable that describes a characteristic of an instance within a model. Overfitting- The process of fitting a model too closely to the training data for the model to be effective on other data. If the data used in a predictive model has too much complexity, it will not be accurate when data beyond the training data is applied to it. Holdout data- In the model training process, existing data with a known target variable that is not used as part of the training data. To test the model that is developed he will use Generalization- The ability of a model to apply itself to data outside the training data. Accuracy- In model performance evaluation, a model's correct predictions divided by its total predictions. Precision- In model performance evaluation, a model's correct positive predictions divided by its total positive predictions. Recall- In model performance evaluation, a model's correct positive predictions divided by the sum of its correct positive predictions and incorrect negative predictions. F-score- In statistics, the measure that combines precision and recall and is the harmonic mean of precision and recall. Confusion matrix- A matrix that shows the predicted and actual results of a model.

Technologies such as sensors and computer vision greatly improve an organization's ability to handle risk. Understanding the functions and value of these technologies is key to implementing them successfully.

Transducer- A device that converts one form of energy into another. Actuator- A mechanical device that turns energy into motion or otherwise effectuates a change in position or rotation using a signal and an energy source. Accelerometer- A device that measures acceleration, motion, and tilt.

What is typically the easiest way for insurers to target products?

Typically, the easiest way for insurers to target products is to develop special sales tactics to sell existing products and services already being offered to a particular market segment.

Raj, a claims manager for Great Insurance Company, uses cluster analysis in an effort to better predict the ultimate loss value for claims of a certain type. He discovers that most of the outlying claims in his analysis involved both attorney representation and hospitalization, and is able to use a generalized learning model to make better projections for future claims. In this scenario, which one of the following is the target variable?

Ultimate loss value

Insurers can use data analytics in many ways to improve the underwriting process. For example, they can use vehicle telematics to classify drivers and establish rates more accurately. They can also use data mining, cluster analysis, and predictive modeling to make sense of large amounts of seemingly random data. A wide variety of data analysis techniques can be applied to internal and external data to improve the underwriting process. Here, we'll discuss the application of telematics and data mining. However, there are many other ways that data analytics and underwriting intersect. Q) What are some different ways to collect vehicle telematics data? A) Vehicle telematics data can be collected through tracking devices installed in vehicles and GPS-enabled smartphone applications.

Usage-based insurance- A type of auto insurance in which the premium is based on the policyholder's driving behavior. ***Usage based insurance*** Data mining- The analysis of large amounts of data to find new relationships and patterns that will assist in developing business solutions. Social media may be useful Cluster analysis- A model that determines previously unknown groupings of data. Identifying the variables to analyze. Cluster analysis can be used to create clusters of similar claims according to various attributes, such as claim size, cause of loss, or type of injury/damage. It's a form of unsupervised learning. The cluster analysis uses k-means, which indicates the number of clusters within which to group the data, to organize data into clusters of claims closest in distance (and therefore similar) to each group's centroid. Instance (example)- The representation of a data point described by a set of attributes within a model's dataset. K-means- An algorithm in which "k" indicates the number of clusters and "means" represents the clusters' centroids. Centroid- The center of a cluster. Nearest neighbor- The most similar instance in a data model. Attribute- A variable that describes a characteristic of an instance within a model. Target variable- The predefined attribute whose value is being predicted in a data analytical model. Predictive model- A model used to predict an unknown outcome by means of a defined target variable. A predictive model would not have been an appropriate tool to use before the cluster analysis because the insurer did not know what relationships it was looking for. Now that the intent of the model is clear, a predictive model can be developed.

The two main tent poles of ethical models are

fairness and transparency Q) After state regulators call into question some of its underwriting decisions, Insurance Company shares with those regulators both the data used to reach those decisions and how it was collected and used. This is an example of which one of the qualities of ethical data use? A) Transparency ********************************************************************* In such cases, data mining that involves a cluster analysis can help underwriters more accurately analyze and price loss exposures. The insurer will use its predictive model to review existing accounts and make appropriate underwriting decisions. Data Management Benefits- How do these benefits affect risk management? Sound risk management decisions are predicated on quality data. For example, access to up-to-date financial data such as interest rates and market trends can minimize financial risks. Loss statistics and demographic projections may affect strategic risks such as competitive standing or merger considerations. Keeping up on the latest technology announcements may influence operational risks associated with employee performance, morale, and more. Q) What are some advantages of effective data management? A) These are some of the benefits of a robust, comprehensive data management initiative: • Increased overall efficiency • Enhanced on-demand access to data • Improved decision-making

Data Quality Management

Validity- Is this within the acceptable range to reflect business expectations? Data is considered valid if it is, in fact, measuring what it purports to, is correctly stored and formatted, and conforms with any applicable internal data governance standards. Accuracy- A particular dataset's accuracy, although nominally determined by whether the data is correct, is also measured by whether the form in which the data is presented is unambiguous and consistent (for example, the format in which date of birth is represented). one of the following categories of data quality measures how well data represents true values and the business information being analyzed? Completeness- Completeness is defined by how comprehensive the dataset is relative to what it claims to represent and the extent to which it can be used for its intended purpose. Reasonability- Is the data at a level of consistency that mirrors business conditions within an acceptable range? A dataset is reasonable if it nominally makes sense. For example, if some of the data fields in a listing of U.S. zip codes contained letters, it would not be considered reasonable. Timeliness- In analyzing claims data for the purpose of rate-setting for homeowners policies, Glenn discovers that much of the data in the sample is more than ten years old. Timeliness refers to a dataset's relevance relative to its intended purpose. That is, is it sufficiently up to date for its results to be considered currently applicable Data lineage- A dataset's lineage is, to the extent made possible by internal data governance policies, a history of its origin, the dates on which any changes occurred to it, and the nature of those changes. It is essentially the dataset's life story.tracing data from its source to its destination in an effort to explain unexpectedly inconsistent or inac

Here are some examples of data management considerations associated with big data:

Volume- An enormous universe of data is now available, and it only continues to grow. An organization's data management plan needs to evolve accordingly and appropriately. Variety- Big data includes a high volume of structured data. However, because big data comes from multiple sources, it also includes a lot of unstructured data, the management of which may differ from that of structured data. Velocity- This is the constantly increasing speed at which data arrives. It also includes the growing rate of change in types of data. Veracity- This refers to the completeness and accuracy of data. Unstructured big data is more likely to have less veracity than structured data. However, even traditional structured data is not perfect. Also, organizations can often gain useful information from big data, even if it has lower veracity. Value- Value is derived from the results of data analysis to help organizations make better business decisions. Big data has great potential to add value, but it must be obtained and analyzed with techniques that provide meaningful results.

The Rise of Sensor Technology

Wearables, telematics devices, and other kinds of sensors are not new. For years, doctors have used wearable electrodes to record patients' cardiac activity. Flight data recorders have been used for decades to aid in the investigation of aviation incidents. But rapid advances in microprocessors, battery miniaturization and performance, camera clarity, data storage, and wireless connectivity have revolutionized this technology and made it more ubiquitous. Q) Rapid advances in which one of the following groupings has heralded a revolution in sensor and wearable technology? A) Microprocessor capacity, battery miniaturization, and camera clarity

Sensors and Sensor Networks When an insurer aligns its operational directives and information technology (IT) capabilities, it can harness advances in technology to achieve several enterprise-wide goals. These include creating competitive advantages; increasing operational efficiency; developing key insights to aid decision making; and facilitating governance, risk, and compliance initiatives.

When set up properly, sensors can provide a great benefit in commercial applications. For example, they can monitor workplace conditions and various environmental factors to ensure safe conditions for employees and equipment. However, every sensor on a network provides an access point for third parties who wish to hijack or disrupt the system, or to eavesdrop. To reduce network traffic, some sensor networks employ a technique called data aggregation. These networks are designed with the assumption that each sensor node (gathering encrypted data) is unsecured; however, collected data from all nodes is aggregated at a secure base station. Q) Sensors are categorized by their functions and applications for risk assessment and control. What are the four categories of sensors? A) The four categories of sensors are mechanical, thermal, radiant, and biochemical. The best product targeting results are typically generated when insurers offer solutions to a market segment by combining special insurance products with sales approaches tailored to that segment's specific needs. For example, to market to volunteer fire departments, an insurer may use specially prepared insurance products and an individualized approach to assessing the departments' risks (such as a risk management survey created for fire departments).

Why must text be both retrieved and prepared in the first step of the text mining process?

When text is retrieved from a group of claim fi les, it's often not clean enough for use in a model because people frequently misspell words or use abbreviations. Plus, it's likely that the text contains articles, conjunctions, prepositions, and suffi xes that provide little value and must be eliminated during preprocessing. Computers can also be trained to ignore certain words, recognize abbreviations, and look for keywords relevant to the situation being studied. Q) How could an insurer use structured data such as that produced in the text-mining example? A) Cleaned-up text can be used in a variety of models. For example, suppose that an insurer has experienced a recent increase in the cost of claims. By performing a cluster analysis on claims notes, the insurer may discover new fraud indicators. From there, claim files can be analyzed to discover which documents are nearest neighbors to fraudulent claim files. This can be an indicator of fraud. A predictive model can then be built using the fraud-correlated terms as attributes to identify claims that are potentially fraudulent.

cost-benefit analysis

a study that compares the costs and benefits to society of providing a public good Q) Lainie, a data manager for Insurance Company, is tasked with assessing whether the company should adopt a new technological solution. To do this, she first analyzes both the positive and negative results that are likely to occur as a result of making the change. This is called a(n) A) Cost-benefit analysis The domino theory, sometimes referred to as the sequence of events theory, proposes that five accident factors can form a chain of events that lead in succession to a resulting accident and injury. Because each of the first four links of the domino theory leads directly to the next, removing any of them should, in theory, prevent the resulting injury from occurring. Removal of the third domino, the unsafe act and/or mechanical or physical hazard, is usually the best way to break the accident sequence and prevent injury or illness. Considering its emphasis on human fault, the domino theory is best applied to situations within human control. Although people can attempt to protect themselves and their property from a natural disaster such as a hurricane, preventing the cause of the accident is not within their control.

Cluster analysis can be used to create clusters of similar claims according to various attributes, such as claim size, cause of loss, or type of injury/damage.

claims in the outlier cluster and discover that the outliers share three additional attributes: 1) They were reported more than one week after the date of the accident. 2) Liability was originally denied by the third-party claims representatives. 3) All claimants were represented by an attorney. Q) Park Slope Baking recently began manufacturing a line of gluten-free products. Caroline is the products liability underwriter for the account and is not sure how concerned she should be about the new product line. She knows that gluten-free products have become more popular recently, but she does not know if there have been any products liability problems. Which one of the following is an appropriate technique that data scientists can use to help Caroline discover emerging risks with gluten-free products? Advances in cluster analysis techniques are enabling insurers to improve the accuracy of loss estimates for long-tail claims and more effectively detect fraudulent claims A) Cluster analysis ********************************************************************* Each step of the claims process offers opportunities for analysis-driven improvements. Some of the most prevalent techniques driving such improvements are cluster analysis, classification tree analysis, and text mining.

Two main types of data model creation are

supervised learning and unsupervised learning. Both are used to find patterns in large datasets. Q) Adam analyzes data for a large P&C insurer. He has been given a project to mine data to group its insureds into rankings based on loss statistics. Adam is most likely to use which one of the following types of data model creation to complete this project? A) Unsupervised learning In many cases, insurers want to determine a numerical value for a target variable rather than, as in the previous example, a categorical value. In these cases, analysts use mathematical functions, such as those used in statistical regression techniques, to create algorithms. One of these methods is linear regression. Another type of linear model that can be used for more complex data classification is a generalized linear model. (This should not be confused with a general linear model, which is a broad group of different types of linear models.) Some accounting data required for analysis may not be specific to any one policy or product. Underwriting expenses and ULAE are examples of this; they can be tracked at the aggregate level.


Ensembles d'études connexes

American History Chapter 23 Study Guide

View Set

Prerequisites: Solving Equations

View Set

Psychosocial/Mental Health & Challenges

View Set

TEXTBOOK: Ch. 6 - Analyzing Focus Group Results

View Set

LANGUAGE #2: Appropriate Words I

View Set

Entrepreneurship Final Review Fall 2016

View Set

JROTC FLASHCARDS - Flag Handling and Trivia

View Set

8/23-24-MBE Practice Exam #1 & 2

View Set

UNIDAD 13 TIPOS DE HIPOTECAS Y FUENTES DE FINANCIACIÓN (TYPES OF MORTGAGES AND SOURCES OF FINANCING)

View Set

CHAPTER 21 IGGY (ADAPTIVE QUIZZING)

View Set

nursing fundamentals medical asepsis

View Set