Insurance Data Analytics
Which of the following is NOT a fundamental concept of data science? -Systematic processes can be used to discover useful knowledge from data. -Analyzing data too closely can result in interesting findings that are not generally applicable. -A data mining approach can be applied in a meaningful way to any context. -The analysis of Big data can yield characteristics of groups of people or events of interest.
-A data mining approach can be applied in a meaningful way to any context. (The selection of data mining approaches and evaluation of the results must be thoughtfully considered according to how the results will be applied)
Why would an insurance professional who is not an actuary or data scientist want to learn about data analytics? -Big data and technology are central to the future of the insurance industry. -Insurance professionals must be able to communicate with data scientists in their organizations. -Results from data analytics can inform decisions across all areas of insurance company operations. -All of the above are tru.
-All of the above are tru.
data science results
-Automating decision making for improved accuracy and efficiency -Organizing large volumes of new data -Discovering new relationships in data -Exploring new sources of data
Four fundamental concepts of data science
-Systematic processes can be used to discover useful knowledge from data -Information technology can be applied to big data to reveal the characteristics of groups of people or events of interest -Analyzing data too closely can result in interesting findings that are not generally applicable -The selection of data mining approaches and evaluation of the results must be thoughtfully considered according to how the results will be applied
Which of the following is an example of unstructured data? -Data that is provided in an Excel spreadsheet. -A Stata dataset. -information on a website. -All of the above are forms of unstructure data.
-information on a website.
Hierarchical clustering groups data Select one: A. According to similarities. B. By identifying outliers. C. According to differences. D. Into two large circles.
A
Internet of Things (IoT)
A network of objects that transmit data to computers.
actuary
A person who uses mathematical methods to analyze insurance data for various purposes, such as to develop insurance rates or set claim reserves.
regression analysis
A statistical technique that predicts a numerical value given characteristics of each member of a dataset.
Which one of the following is a data governance committee (DGC) responsibility? Select one: A. A data governance committee ensures there are few conflicts or redundancies in data standards and practices. B. A data governance committee both retrieves and prepares metadata for use by an organization. C. A data governance committee plays a key role in project management for data projects. D. A data governance committee is charged with monitoring the volume of big data within an organization.
A. A data governance committee ensures there are few conflicts or redundancies in data standards and practices. (This is achieved by having a cross-functional representation of all major stakeholders and eliminating potential conflicts early in the decision-making process involving data systems.)
Which one of the following is correct regarding an insurer's big data? Select one: A. An insurer's big data consists of both its own internal data and external data. B. Most insurers do not currently possess any big data. C. An insurer's big data consists only of its own internal data. D. An insurer's big data consists only of new types of external data.
A. An insurer's big data consists of both its own internal data and external data.
To gain a competitive advantage, maintain profitability, and satisfy customers an organization must Select one: A. Be able to trust its data. B. Have an effective risk management program. C. Pay attention to the marketplace. D. Adopt current accounting rules.
A. Be able to trust its data.
Which one of the following is true regarding data quality? Select one: A. Data quality is a relative, not an absolute, concept. B. It is reasonable to assume that external data comes from reliable sources. C. Quality data retains its quality regardless of how it is used. D. Claims representatives primarily use quality data for pricing decisions.
A. Data quality is a relative, not an absolute, concept.
Which one of the following is correct about data science? Select one: A. Data science is a new field that arose from the need to link big data and technology to provide useful information. B. Data science is traditional computer programming that is applied to larger amounts of data. C. Data science is a field within mathematics that uses new mathematical concepts to analyze data. D. Data science automates all data processing and eliminates the need for human involvement.
A. Data science is a new field that arose from the need to link big data and technology to provide useful information.
Which one of the following is a way that insurers and risk managers can use data science to improve their results through data-driven decision making? Select one: A. Discovering new relationships in data B. Providing human analysis of data C. Determining prior year losses at a particular location D. Using industry data in addition to the organization's own dataI
A. Discovering new relationships in data
Technology that can particularly assist adjusters in evaluating claims after catastrophes is Select one: A. Drones. B. Telematics. C. Sensors. D. Wearables.
A. Drones.
Which one of the following can be applied over time to refine a model to better predict results? Select one: A. Machine learning B. Regression C. Statistics D. Association rule learning
A. Machine learning
Wycliffe Insurance is very concerned about data quality and has many safeguards in place to ensure the data it collects and stores is managed appropriately. New claims data is entered with the date of its arrival to the department. Then the claims representative's activities are also entered with the date and time whenever the file is updated. The organization has chosen this data formatting to reflect the required degree of accuracy that has proven many times to be beneficial when the data is used in settlement negotiations or arbitration hearings. The dimension of stored data quality used in this case by Wycliffe is Select one: A. Precision. B. Flexibility. C. Granularity. D. Organizational consistency.
A. Precision.
Which one of the following is a legal and regulatory concern in obtaining mass information from social media? Select one: A. Privacy B. Fraud C. Volume D. Veracity
A. Privacy
Which one of the following is correct regarding data science teams in insurance and risk management organizations? Select one: A. Risk management and insurance professionals can be valuable members of data science teams.Correct. Risk management and insurance professionals can be valuable members of data science teams. B. Data scientists and computer programmers are the only useful members of data science teams. C. Data scientists are more effective working alone than in teams. D. Data scientists and actuaries are the only useful members of data science teams.
A. Risk management and insurance professionals can be valuable members of data science teams.
Which one of the following is an element of a data security program? Select one: A. Storing data back-ups off site. B. Installing agile project management. C. Implementing a data governance program. D. Increasing the overall efficiency of data systems.
A. Storing data back-ups off site
A project team at Goshen Mutual has been working on developing a new product for the personal insurance market. The team is using geodemographic data to determine which territories would be the best to introduce the product. Geodemographic data would be categorized as Select one: A. Structured external data. B. Unstructured internal data. C. Structured internal data. D. Unstructured external data.
A. Structured external data.
Which one of the following statements is correct regarding the personal data and privacy positions of the European Union (EU) and the U.S.? Select one: A. The EU has one all-encompassing data protection framework and the U.S. has several more targeted privacy laws. B. U.S. companies are required to comply with the EU's General Data Protection Regulation (GDPR) only if they have employees in the EU. C. The U.S. has a stronger cultural expectation of privacy than the EU. D. Class-action lawsuits over privacy are commonplace in the EU, but rare in the U.S.
A. The EU has one all-encompassing data protection framework and the U.S. has several more targeted privacy laws.
In terms of data quality principles, validity is defined as Select one: A. The accuracy of data within predefined and accepted parameters or values. B. The extent that each dataset contains all elements necessary for business needs. C. The process of tracing data from its source to its destination. D. The true value of data relative to the business information being analyzed.
A. The accuracy of data within predefined and accepted parameters or values.
all of the following are fundamental concepts of data science, EXCEPT: Select one: A. The data mining project need not provide actions that lead to better business results in order to be considered worthwhile. B. Information technology can be applied to big data to reveal characteristics of people or events of interest. C. Close analysis may result in substantive findings that may not necessarily lead to actionable conclusions. D. After information is gleaned, the user must have a process to evaluate the accuracy of the any conclusions drawn.
A. The data mining project need not provide actions that lead to better business results in order to be considered worthwhile
Which one of the following is a fundamental concept of data science? Select one: A. The selection of data mining approaches must be considered in the context in which the results will be applied. B. Information technology cannot be applied to big data, and experimental approaches are used. C. Random exploration is the best approach to discover useful knowledge from data. D. Analyzing data as closely as possible is likely to produce the most generally applicable results.
A. The selection of data mining approaches must be considered in the context in which the results will be applied.
Which one of the following is a characteristic that differentiates big data from traditional data? Select one: A. Velocity B. Structure C. Fraud D. Privacy
A. Velocity
Cross Industry Standard Process for Data Mining (CRISP-DM)
An accepted standard for the steps in any data mining process used to provide business solutions.
Data Science
An interdisciplinary field involving the design and use of techniques to process very large amounts of data from a variety of sources and to provide knowledge based on the data.
Data-quality principles
Appropriateness Reasonableness Comprehensiveness Material limitations and alternatives Sampling methods
Machine learning
Artificial intelligence in which computers continually teach themselves to make better decisions based on previous results and new data.
The important first step in a decision-making model is to Select one: A. Purchase the technology. B. Define the problem. C. Assign a data scientist. D. Prepare the data.
B). Define the problem. The important first step in a decision-making model is to define the problem.
A privacy impact assessment (PIA) is Select one: A. A collaborative tool that facilitates workflows. B. A tool used to identify and assess privacy risks C. An example of metadata that defines key data attributes. D. Proprietary software used to detect malware.
B. A tool used to identify and assess privacy risks ( A privacy impact assessment (PIA) can identify and assess privacy risks as well as identify whether information collected complies with legal and regulatory privacy requirements.)
The insurance professionals who have traditionally analyzed data and made predictions based on their analyses are Select one: A. Data scientists. B. Actuaries. C. Computer programmers. D. Claims professionals.
B. Actuaries.
Data scientists must be able to Select one: A. Apply only traditional techniques in new ways to big data. B. Analyze traditional data as well as new types. C. Apply the theories of new physics to data. D. Analyze new types of data only.
B. Analyze traditional data as well as new types.
There are two types of associated risk for data privacy, individual and general risk. General data privacy risk Select one: A. Is of specific concern to the European Union. B. Can be categorized operational or reputational. C. Involves legal and regulatory requirements. D. Varies by the type of business or industry.
B. Can be categorized operational or reputational.
The lifeblood of every organizational function is Select one: A. Risk management. B. Data. C. Regulation. D. Employees.
B. Data
In terms of data governance, IT employees hold the role of Select one: A. Data stewards. B. Data custodians. C. Rule developers. D. Compliance regulators.
B. Data custodians. (IT employees, including architects, are charged with managing the flow of data for an organization. This contrasts with role of a data steward who develop business rules based on the data model IT employees develop.)
Reynolds Insurance provides workers compensation insurance for small to medium sized companies, mostly to its local and regional manufacturers. For data science to be relevant and useful to Reynolds, which one of the following types of technology would be its best investment? Select one: A. Placing telematics in employees' personal automobiles to better identify driving behaviors. B. Harnessing the Internet of Things data to analyze data from wearables such as hardhats and steel-toed boots C. Analyzing data to compare older and younger workers' preferences for longer shifts but shorter work weeks. D. Classifying employees by various demographics to reveal the likelihood of lawsuits against their organization.
B. Harnessing the Internet of Things data to analyze data from wearables such as hardhats and steel-toed boots
Which one of the following defines individual risk? Select one: A. Individual risk may be categorized as operational. B. Individual risk varies according to the type of business. C. Individual risk is reputational in nature. D. Individual risk is defined by the data governance committee.
B. Individual risk varies according to the type of business.
Which one of the following correctly describes a new evolution of big data, sometimes referred to as big data 2.0 in insurance and risk management? Select one: A. Insurers are using the Internet to market their products to customers. B. Organizations can strategically use data from the Internet of Things and vehicle telematics. C. Insurers are exploring the possibility of underwriting personal lines over the Internet. D. Organizations have just started conducting business on the Internet.
B. Organizations can strategically use data from the Internet of Things and vehicle telematics.
The first step in the data mining process is to Select one: A. Prepare the data that will be used. B. Understand what a business wants to achieve. C. Select a data mining technique. D. Collect the data that will be used.
B. Understand what a business wants to achieve.
Insurers can benefit from Select one: A. Analyzing data very closely to locate the most applicable findings. B. Discarding traditional data analysis and using only new data science techniques. C. A framework they can use to approach problems through data analysis. D. Applying the theories of physics to complex insurance data.
C. A framework they can use to approach problems through data analysis.
Greatview Insurance wants to predict which auto liability claims will most likely go to litigation, so it can assign them to experienced adjusters early in the process. There are certain known indicators of litigation that Greatview wants to use in the data mining process. Which one of the following data mining techniques would Greatview's analyst most likely use? Select one: A. Association rule learning B. Cluster analysis C. Classification D. Regression analysis
C. Classification
Which one of the following is a data mining technique an insurer applies when it knows what information it wants to predict? Select one: A. Association rule learning B. Machine learning C. Classification D. Cluster analysis
C. Classification
Which one of the following correctly describes classification? Select one: A. Classification predicts a numerical value given characteristics of each member of a dataset. B. Classification develops algorithms to develop rules to apply to new data. C. Classification assigns members of a dataset into categories based on known characteristics. D. Classification explores data to find groups with common and previously unknown characteristics.
C. Classification assigns members of a dataset into categories based on known characteristics.
Which one of the following functions of a data management program would allow accounting transactions to automatically update an organization's financial statements? Select one: A. Data preparation B. Data governance C. Data integration D. Data access
C. Data Integration
Internal data entry processes that capture accounting transactions, customer data or other operational transactions are called Select one: A. Data governance. B. Data quality. C. Data capture. D. Data integration.
C. Data capture. ( Data capture, including data preparation, are a business' day to day transactions.)
Which one of the following is a basic process in any data security program? Select one: A. Establish a data governance committee (DGC). B. Establish metrics for timeliness of data refresh in systems. C. Develop and enforce stronger password protocols. D. Perform random sampling of data for accuracy.
C. Develop and enforce stronger password protocols.
Technology that can particularly assist adjusters in evaluating claims after catastrophes is Select one: A. Telematics. B. Sensors. C. Drones. D. Wearables.
C. Drones.
Durham Insurance insures many trucking operations and has a large database of loss experience. It wants to know how many hours a driver can drive without taking a break before the likelihood of having an accident increases. Which one of the following data mining techniques would an analyst use to predict the number of hours? Select one: A. Cluster analysis B. Classification C. Regression analysis D. Association rule learning
C. Regression analysis
Gabrielle is a claims representative from Onward Insurance. She is using external data to verify some information as part of a claims investigation. In doing so, Gabrielle will likely rely on all of the following sources of information, EXCEPT: Select one: A. Economic data B. Geodemographic data C. The claimant's recorded statement given to Gabrielle D. Credit ratings obtained with permission
C. The claimant's recorded statement given to Gabrielle (internal data)
The descriptive approach is applied Select one: A. To process information received from the Internet of Things. B. Repeatedly to provide information for data-driven decision making. C. When an insurer or risk manager has a specific problem.. D. When an insurer or risk manager is deciding what type of computer technology to purchase.
C. When an insurer or risk manager has a specific problem..
classification
Categorizing members of a dataset based on known characteristics.
chapter 1
Chapter 1
data mining techniques
Classification Regression analysis Association rule learning Cluster analysis
Under the General Data Protection Regulation (GDPR), a data controller's role is to Select one: A. Represent the business aspects of data governance. B. Manage the flow of data for the rest of the organization. C. Define the metrics used to measure an organization's overall data quality. D. Define how and for what purpose personal data should be processed.
D. ( The GDPR specifically defines the tasks of the data controller, a mark of the importance the European Union places on the personal privacy of its citizens.)
Big data includes Select one: A. Neither structured nor unstructured data. B. Unstructured data only. C. Structured data only. D. Both structured and unstructured data.
D. Both structured and unstructured data.
Galliano Insurance Agency knows that it is most likely to retain the customers that it insures for multiple lines of coverage. The agency is always trying to identify new insurance products. To help identify new products, it needs a data mining technique that will explore data to find groups with common and previously unknown characteristics. Which one of the following data mining techniques should Galliano Insurance Agency use? Select one: A. Classification B. Regression analysis C. Association rule learning D. Cluster analysis
D. Cluster analysis
Data governance provides Select one: A. The internal data entry processes needed to capture accounting transactions. B. A dynamic view of data without needing to move it between systems. C. A road map that details where data is located. D. Definitions, standards and procedures for how data is used.
D. Definitions, standards and procedures for how data is used. ( Data governance is the starting point, or rule set for managing data.)
Which one of the following is an example of a data governance tool? Select one: A. Risk Management B. Metadata C. Data integration D. External Policy
D. External Policy (A data governance committee also uses internal policies, external policies, enterprise data models and collaborative tools such as agile project management to achieve its aims.)
Gustav is a claims representative from Forefront Insurance. He is accessing online weather reports to determine hazardous road conditions alleged by an injured claimant. These weather reports are an example of which one of the following types of data? Select one: A. External structured data B. Internal unstructured data C. Internal structured data D. External unstructured data
D. External ustructured data
The purpose of data analytics for insurers is to Select one: A. Automate most organizational processes. B. Eliminate the need for human analysis. C. Acquire and use all the new types of technology. D. Make data-driven decisions and strategy.
D. Make data-driven decisions and strategy.
Malware is defined as Select one: A. Software technology used to encrypt data. B. A tool for managing data security. C. A hardware-based security breach. D. Software designed to cause damage.
D. Software designed to cause damage. (to a computer, server, or network.)
Data scientists at Grisham Risk and Insurance are working with underwriting, claims, and other department professionals to design data mining projects to help generate better business solutions and decisions. Which one of the following is true regarding this collaboration at Grisham? Select one: A. It is important that there be clear divisions between the actuaries and the data scientists. B. The data scientists will concentrate on analyzing numerical or categorial data. C. For any project such as this, the claims professional will provide the domain knowledge for the others. D. The data scientists will explore underutilized sources of data, such as social media and social networks.
D. The data scientists will explore underutilized sources of data, such as social media and social networks.
The data quality principle of reasonability refers to Select one: A. The appropriateness of current data. B. The systematic process of tracing data. C. The comprehensive nature of data. D. The materiality or relevance of data.
D. The materiality or relevance of data. (testing whether the information provided is pertinent to the business objective at hand.)
Data science is especially useful for Select one: A. Structured data. B. Internal data. C. Databases. D. Unstructured data.
D. Unstructured data.
Tania works in the fraud unit for Greatview Insurance. There is a claimant who appears to be involved in multiple cases of insurance fraud. Tania decides to use social media to obtain information that may be used to develop a profile of the claimant. Tania's use of social media is an example of which one of the following types of data? Select one: A. Unstructured internal B. Structured internal C. Structured external D. Unstructured external
D. Unstructured external
Which of the following is an example of an insurer's use of new technology on tradition, internal data? -Data collected by drones for adjusting claims. -Data collected through telematics devices on automobiles. -Data collected from text mining claim managers' notes. -All of the above.
Data collected from text mining claim managers' notes
Basic functions of a data management program
Data governance Data preparation and capture Data access Data quality Data integration
Structured data
Data organized into databases with defined fields, including links between databases.
External Data
Data that belongs to an entity other than the organization that wishes to acquire and use it.
Unstructured Data
Data that is not organized into predetermined formats, such as databases, and often consists of text, images, or other nontraditional media.
Internal data
Data that is owned by an organization.
Association rule learning
Examining data to discover new and interesting relationships. From these relationships, algorithms are used to develop rules to apply to new data. An insurer can explore data to find relationships among its products purchased.
There is little need to be concerned about price optimization because the insurance market is competitive. True False
F
Wearable devices, like FitBits, are just gimmicks and have not provided any real benefits with regard to changes in health. True False
F
Data Management Benefits
Increased overall efficiency Enhanced on-demand access to data Sounder decision-making
Metadata
Information about the data: valuable and meaningful tool for identifying and understanding data, organizing data, creating connections between data, increasing the functionality and usability of data, and managing data quality.
domain knowledge
Information related to the context of the information a data scientist is working with.
Big data
Sets of data that are too large to be gathered and analyzed by traditional methods.
A "Not only SQL" database allows for the collection of data that do not require a rigid schema (format). True False
T
After information is gleaned, the user must have a process to evaluate the accuracy of the any conclusions drawn. (T/F)
T
Big data provides new opportunities for insurers to determine customers' willingness to pay for insurance coverage. True False
T
Close analysis may result in substantive findings that may not necessarily lead to actionable conclusions.(T/F)
T
Consumers' expectations are changing in terms of the technological features they expect from a company, e.g., online access to account information. Insurers that don't keep up with changing technology may lose market share. True False
T
Data "ingestion" refers to the need to quickly process data (i.e., use it as it is being collected). True False
T
Data "variety" refers to the fact that there are now many more sources and types of data than ever before. True False
T
Information technology can be applied to big data to reveal characteristics of people or events of interest. (T.F)
T
Insurers posess a lot of internal data that was not very accessible until the development of new techniques of data analysis. T/F
T
Regulators have a legitimate concern that insurers' use of big data may lead to excessive or discriminatory rates for coverage. True False
T
True or False? A cluster analysis involves using statistical methods to find commonalities across groups, e.g., groups of insureds.
T
Telematics
The use of technological devices in vehicles with wireless communication and GPS tracking that transmit data to businesses or government agencies; some return information for the driver.
Unit 2
Unit 2
cluster analysis
Using statistical methods, a computer program explores data to find groups with common and previously unknown characteristics.
data management considerations associated with big data:
Volume Variety Velocity Veracity Value
Among the following apps, which saw the greatest increase in activity per minute (e.g., posts or messages) from 2012 to 2014? Email Twitter WhatsApp
Data governance goals
accuracy, validity, timeliness, completeness
quiz 1
quiz 1
quiz 2
quiz 2