Tingnan ang lahat ng mga set ng pag-aaral

iCAS_DS1

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Which one of the following is a type of scatter plot? A. A correlation matrix B. A bubble plot C. A summary statistic D. A histogram

B. A bubble plot

Which one of the following can be used to pull together and match all the data for each claim, even when data is stored in disparate sources? A. A claims data warehouse B. A common key C. Different grains D. Claims transactions

B. A common key

Which one of the following is correct regarding how a data scientist can obtain HTML data from the internet? A. A data scientist must request all the information contained on a website. B. A data scientist can obtain HTML data from multiple websites on a repeated basis. C. A data scientist can only obtain HTML data from one website at a time. D. A data scientist must submit a request each time HTML data is needed.

B. A data scientist can obtain HTML data from multiple websites on a repeated basis.

An insurer wants to examine the age distributions of its homeowners policyholders. Which one of the following data visualization methods would best accomplish this? A. A correlation matrix B. A histogram C. A bubble plot D. A scatter plot

B. A histogram The insurer could use a histogram to examine the age distributions of its homeowners policyholders. To do so, a data analyst would determine age groupings and restructure the insurer data into a table of those groupings. From that table, a histogram could be created.

If a data scientist is creating a table in a SQL relational database, he or she will select which one of the following to identify the table? A. R B. A primary key C. Sqldf D. Any column

B. A primary key

Which one of the following is true regarding scatterplots and correlations? A. If the scale of a scatterplot changes, its correlation strength also changes. B. A scatterplot's form reveals whether the points are associated with one another. C. A scatterplot's strength is described by labels that remain consistent without variation. D. Confidence intervals are plotted to display the variability of noise in the data.

B. A scatterplot's form reveals whether the points are associated with one another.

Binning is A. When a predictive model fits not just patterns in the data, but also random fluctuations in that data that will not be present when the model is applied to another dataset. B. A way of dealing with numeric variables when the relationship between the variable and the target variable is unknown or changes with the level of the other variable. C. A set of mathematical operations, such as regression, classification trees, and clustering, that can be used to achieve a desired result. D. A statistic used to evaluate relationships between categorical variables. A high value supports the hypothesis of a significant relationship.

B. A way of dealing with numeric variables when the relationship between the variable and the target variable is unknown or changes with the level of the other variable.

Which one of the following organizations provides financial ratings used by insurers, producers, reinsurers, insureds, and investors? A. The Comprehensive Loss Underwriting Exchange B. A.M. Best C. Insurance Services Office, Inc. D. The American Association of Insurance Services

B. A.M. Best

Which one of the following categories of data quality measures how well data represents true values and the business information being analyzed? A. Timeliness B. Accuracy C. Validity D. Reasonability

B. Accuracy

Any solution achieved by employing the model for data-driven decision making is best characterized as which one of the following? A. Predictive B. Actionable C. Traditional D. Descriptive

B. Actionable

Which one of the following is a limitation of algorithms? A. Algorithms identify patterns too subtle to be detected by human observation. B. Algorithms are literal. C. Algorithms perform analytical operations quickly. D. Algorithms make predictions more accurate.

B. Algorithms are literal.

A data scientist wants to look for heart, lung, and kidney diseases. Which one of the following Perl regular expressions can match any reference for one of these disease types in a text file? A. Grep function B. Alternating matches C. Split function D. s / / / operator

B. Alternating matches

Which one of the following is an example of a piece of qualitative data? A. The number of auto insureds in a particular region B. An assessment of an auto insured's likelihood of being involved in an auto accident C. The ages of five insureds D. Temperatures measured on the Celsius scale

B. An assessment of an auto insured's likelihood of being involved in an auto accident

A data scientist is working on a claims triage model. Triage for claims will take place at first notice of loss (FNOL). Which one of the following should the data scientist understand? A. By the time a call center receives an FNOL, a claims professional has made an initial reserve estimate. B. Any data captured or revised after the FNOL will have to be excluded. C. The initial reserve estimate will be important in the triage model. D. Time-mismatched data can be used after it is consolidated.

B. Any data captured or revised after the FNOL will have to be excluded.

Cryptographic hash functions A. Are designed so that two different inputs (searches) produce the same output (result). B. Are cost-effective if the elements of a set of objects will be compared with each other repeatedly. C. Are only useful for equality testing small or uncomplicated data structures. D. Are advantageous in their requirement of computing the hash value for each object, which can be done very quickly.

B. Are cost-effective if the elements of a set of objects will be compared with each other repeatedly.

Because of the variety of auto coverages available and the possibility that multiple autos are covered by one or more policies, a data scientist who is creating an auto data test sample should A. Choose data for the test sample based on a random attribute, such as policy number. B. Attempt to gather a test sample that represents how and from whom the auto data was collected. C. Test all the data in a large database. D. Only use well-established data, such as credit scores.

B. Attempt to gather a test sample that represents how and from whom the auto data was collected.

A Perl function that can be used to create one regular expression to change part of the expression slightly in each instance in which it applies is a A. Lookbehind. B. Backreference. C. Lookahead. D. Grep function.

B. Backreference.

In using Structured Query Language (SQL), a primary key can A. Consist of a single column of data. B. Be a column serving as a unique identifier. C. Rarely be the basis for a designated index. D. Include null values.

B. Be a column serving as a unique identifier.

Which one of the following is correct regarding data marts? A. The data in data warehouses will typically be stored in a data mart that uses JSON. B. Because data marts are smaller in size than data warehouses, it is faster to extract data from them. C. Data marts consist of many data warehouses. D. Because data marts are so large, it is slower to extract data from them.

B. Because data marts are smaller in size than data warehouses, it is faster to extract data from them.

Ramon and his manager are viewing several different displays, each created from the same data, to help detect anomalies in the data. Ramon's manager points out possible outliers just beyond the whiskers in one of the displays. To which one of the following types of displays is the manager referring? A. Histogram B. Boxplot C. Bar graph D. Scatterplot

B. Boxplot

Which one of the following is one of the two basic formats in which the United States Census Bureau provides data? A. Census tract codes B. Census block group C. Federal Information Processing Standards codes D. American National Standards Institute code

B. Census block group

A data scientist is creating a table in a relational database regarding auto claims. The data fields include claim number, insured last name, date of loss, annual premium, and amount of loss. Which one of the following data fields would the data scientist select as the primary key? A. Date of loss B. Claim number C. Insured last name D. Amount of loss

B. Claim number

Which one of the following is the regular expression that can be used to separate words in a string expression? A. "\W" B. "\w" C. "\S" D. "\s"

D. "\s"

Which one of the following is correct regarding a data scientist's analysis of unstructured text data? A. A data scientist should analyze the text data in its original form because it is simpler. B. A data scientist can use the same methods of analysis with unstructured text data that are used with structured data. C. A data scientist should avoid using text data when possible because of its lack of structure. D. A data scientist should move unstructured text data into a structured environment.

D. A data scientist should move unstructured text data into a structured environment.

A normal distribution is characterized by A. A skewness greater than zero. B. A heavy tail. C. Being right skewed. D. A kurtosis of four.

D. A kurtosis of four.

A data scientist typically uses an application programming interface (API) to A. Communicate with multiple websites simultaneously. B. Communicate with Facebook users. C. Access United States government information only. D. Access data of interest on large websites.

D. Access data of interest on large websites.

One important detail to remember when creating log graphs is to A. Make sure the logarithmic scale is always displayed on the x-axis of a semi-log graph. B. Consistency use base 10 logarithmic scale for succinctness and most meaningful results. C. Ensure that the linear scale shows and draws attention to the area where outlier values occur. D. Advise end users that log graphs are not conventional linear graphs so to correctly perceive results.

D. Advise end users that log graphs are not conventional linear graphs so to correctly perceive results.

Which one of the following best describes the philosophy of graphical integrity? A. Allowing the display to overpower the understanding of the data B. Providing the graphical elements without any artistic representation C. Using color effectively to overtake the display and data D. Aligning the information's purpose with the chosen display

D. Aligning the information's purpose with the chosen display

ABC Insurance wrote a new homeowners policy for a one year term. The accounting date was 12/20X6, the inception date was 3/20X7, and the effective date 3/20X7. From this information, how will Insurance Services Office, Inc. (ISO) process this information from the premium transaction record? A. The policy expiration date is calculated as 12 months past the accounting date. B. The written premium is earned into the 20X6 and 20X7 calendar years appropriately. C. For reconciliation with Statutory Page 14 data, the ABC written premium records are aggregated and compared with its 20X7 annual statement. D. All the codes entered on the premium transaction record are validated against the statistical plan codes in effect on 3/20X7.

D. All the codes entered on the premium transaction record are validated against the statistical plan codes in effect on 3/20X7.

Which one of the following best describes a data quality advocate? A. A person who manages the quality of data for his or her organization but not for external suppliers. B. An analyst who no longer needs to screen data for problems. C. Actuarial teams that spend one-quarter of their efforts on data quality issues. D. An actuary or a modeler who is familiar with data quality literature or course material.

D. An actuary or a modeler who is familiar with data quality literature or course material.

Which one of the following is an example of a piece of nominal data? A. Temperatures measured on the Celsius scale B. The number of auto insureds in a particular region C. A ratio scale measurement D. An auto insured's likelihood of being involved in an accident

D. An auto insured's likelihood of being involved in an accident

Which one of the following insurers would most likely reap the benefits of aggregated statistical plan data? A. An insurer that has access to extensive databases of information B. An insurer that has the volume of data needed to determine appropriate rates C. An insurer that is downsizing and wants to gradually reduce its books of business D. An insurer that wants to expand into a different territory and offer new coverages

D. An insurer that wants to expand into a different territory and offer new coverages

Which one of the following is true concerning the approximate matching of strings to specifications in a query? A. Approximate matching results from searching for correctly spelled, but inaccurate, entries in a database. B. Search results that are inappropriate for use by data scientists are referred to as "fuzzy matches." C. Data scientists are unable to ignore misspellings in the data they are searching and analyzing. D. Approximate matching of strings is necessary to compensate for possible typographical errors.

D. Approximate matching of strings is necessary to compensate for possible typographical errors.

If the regular expression of "/^BI/" is used in a string matching application, which one of the following would be a match? A. B00874I B. 270BI81 C. 76829BI D. BI20087

D. BI20087

The Unicode Standard is intended to A. Replace the programs used by different computer manufacturers. B. Replace the International Organization for Standardization's standard. C. Be compatible with the standards used by different computer and programming manufacturers. D. Be compatible with the International Organization for Standardization's standard.

D. Be compatible with the International Organization for Standardization's standard.

Which one of the following is best described as a visual display of the five-point summary of a variable's distribution? A. Bar graph B. Categorical table C. Histogram D. Boxplot

D. Boxplot

The people at an organization who use data on a regular basis and are often the most qualified to provide feedback on data needed for analysis are identified as A. Technical experts. B. Custodians of data. C. Information technology (IT) employees. D. Business analysts.

D. Business analysts.

To be most effective, the data science team working at an insurance company Select one: A. Should rely on underwriters to study patterns of claimants and medical providers. B. Consults with subject matter experts but does not include them as valuable team members. C. Is best served with information technology (IT) professionals and actuaries having some domain knowledge. D. Can use risk managers and insurance professionals to provide context for data science projects.

D. Can use risk managers and insurance professionals to provide context for data science projects.

Which one of the following best describes a heat map? A. Heat maps do not pose the usual visualization problems for color-blind users. B. Heat maps are created to be viewed without any interactivity with the user. C. Colors should be selected that overpower each other to show needed emphasis. D. Colors and intensities may be integrated to produce a tie-dyed effect.

D. Colors and intensities may be integrated to produce a tie-dyed effect.

Monique is designing and constructing a dashboard for a start-up company that has provided her with the organization's first (and only) three months of data on customer demographics, sales by state, income, employee roster by department with phone numbers, revenue, expenses, and pictorial product list with prices. To most effectively display the information, Monique A. Shows income on an interactive map and the customer demographics in a scrolling series of tables. B. Designs the dashboard with each of the seven elements displayed on side-by-side Excel spreadsheets for conformity. C. Uses graphs of various colors to best display the employee roster and pictorial product list with prices. D. Compares revenue and expenses by month on a graph, and chooses other displays based on each element's particular information.

D. Compares revenue and expenses by month on a graph, and chooses other displays based on each element's particular information.

Duplicated, deleted, and excluded data are issues related to which category of data quality? A. Validity. B. Reasonability. C. Timeliness. D. Completeness.

D. Completeness.

Horace is a data scientist analyzing past severe injury claims that became lawsuits and later were tried in court. He wants to use the loss paid data and the total legal expense paid data in a reserving application for new and similar bodily injury claims. Horace will need to create a new variable combining the two components of his existing data. Which one of the following best describes what Horace does? A. Censors a variable B. Converts into binned categories C. Normalizes data D. Computes derived variables

D. Computes derived variables

The coverage trigger rating element in the Insurance Services Office, Inc. (ISO) general liability statistical plan is captured in the coverage code data element to distinguish whether A. Payroll or square feet determine the premium. B. Whether the insured business is a restaurant or plant nursery. C. The policy covers premises or operations. D. Coverage is provided on an occurrence or a claims-made basis.

D. Coverage is provided on an occurrence or a claims-made basis.

Which one of the following best describes the primary use of credit data? A. Assessment of personal claims potential B. Calculation of insurance rates C. Calculation of credit-based insurance scores D. Creation of individual credit scores

D. Creation of individual credit scores

Another term for some business analysts is a(n) A. Custodian of data. B. Technological expert. C. Information technology (IT) employee. D. Data steward.

D. Data steward.

A potential discount for personal auto insurance that can be captured in a data element and also (in some class plans) collected through the class code is A. Antilock brakes. B. Passive restraint. C. Antitheft devices. D. Defensive driver.

D. Defensive driver. A defensive driver discount for personal auto insurance can be captured in a data element and also (in some class plans) collected through the class code. Antilock brakes, antitheft devices, and passive restraint are discounts, but they are not captured in the class code.

Elizabeth is an actuary working for an insurance company. She is hoping to make further use information her company reported to a statistical agent and that is required under a summary-based statistical plan. This summarized basis does not include policy limits or collection of exposures. Which one of the following is Elizabeth most likely to discover? A. Elizabeth can calculate the effect of a deductible change and price it accordingly. B. Elizabeth can use the extension of exposure technique, which is more precise for ratemaking than the on-level method. C. Elizabeth can use policy limit as a predictive variable in an analysis with the level of detail collected. D. Elizabeth finds that in a summary-based statistical plan, premium and loss transaction records with the same data elements are combined into a single record.

D. Elizabeth finds that in a summary-based statistical plan, premium and loss transaction records with the same data elements are combined into a single record.

A general guideline for the use of points and lines on a graph is to A. Use a straight line as a trend line rather than a curved or another shape. B. Display five or more lines on one graph for comparison. C. Use dashed and dotted lines to differentiate one line from another. D. Emphasize one line by making it black, with the other lines one different color.

D. Emphasize one line by making it black, with the other lines one different color.

Which one of the following is a process a data scientist uses to transform data into the appropriate format for a data warehouse? A. Query B. Business intelligence C. Structured Query Language (SQL) D. Extract, transform, and load (ETL)

D. Extract, transform, and load (ETL)

Alicia is selecting the display elements to be used in visualizing her data in a graph. Which one of the following would use color appropriately to provide a meaningful display? A. Alicia wants to boldly contrast two trend lines on the graph and so uses yellow and blue from opposite sides of the color wheel. B. Alicia is choosing greens and blues to convey calmness, even though her employer mandates a different color scheme be used. C. To highlight one section of a pie chart, Alicia separates the slice and colors it with a lighter shade of the same color. D. For her New York-based clients, Alicia uses red to draw attention to recent business losses and negative earnings.

D. For her New York-based clients, Alicia uses red to draw attention to recent business losses and negative earnings.

The ability of a model to apply itself to data outside the training data is called A. Overfitting. B. Separation. C. Sampling. D. Generalization.

D. Generalization.

Which one of the following is true regarding the Health Insurance Portability and Accountability Act (HIPAA) of 1996? A. An employee's authorization for disclosure of his or her personal medical records is always required. B. HIPAA is a state law that aims to improve the efficiency of the health information system. C. Sharing information with an employer may be done without the patient's authorization. D. HIPAA's Privacy Rule applies to defined covered entities and business associates.

D. HIPAA's Privacy Rule applies to defined covered entities and business associates.

Data fields in a formatted text file A. Are separated by a delimiter. B. Have a variable width. C. Contain stopwords. D. Have a prespecified width.

D. Have a prespecified width.

Blind tests help increase reliability by helping to prevent which effect? A. Piper Effect B. Experimental Design Effect C. Vision Effect D. Hawthorne Effect

D. Hawthorne Effect

Which one of the following is true regarding censoring a variable? A. After censoring, the resulting display has more extremely high and low values. B. This transformation raises the influence of extreme values and on statistical procedures and models. C. The transformation has no insurance applications as its variables are positively skewed. D. In insurance, the data scientist may limit the large positive values for a variable to a lower value, such as $1 million.

D. In insurance, the data scientist may limit the large positive values for a variable to a lower value, such as $1 million.

Which one of the following is true regarding a data governance committee (DGC) for an insurer? A. A disadvantage of a DGC is its lack of authority to quickly approve resources for remediation. B. Most DGC members are chosen from the information technology (IT) department. C. Because of its cross-functional representation, many efforts are conflicting and redundant. D. In lieu of a chief data officer, an analyst may be selected as the chief data steward.

D. In lieu of a chief data officer, an analyst may be selected as the chief data steward.

Which one of the following is one of the four fundamental concepts of data science? A. Data mining is always profitable and worthwhile, even if business results cannot be improved. B. Data science strives to develop models that primarily fit historical data. C. Data analysis is more applicable to a problem the more closely the data is analyzed. D. Information technology applied to big data can reveal characteristics of groups of people.

D. Information technology applied to big data can reveal characteristics of groups of people.

Which one of the following statements is most accurate? A. Insurers have only recently been involved in analytics. B. Insurers have typically focused only on externally generated data. C. Access to greater volumes of external data has not resulted in an increase in model sophistication. D. Insurers have typically focused on their own internally generated data.

D. Insurers have typically focused on their own internally generated data.

The "box" in a box and whiskers display represents the A. Minimum value. B. Maximum value. C. Outliers. D. Interquartile range.

D. Interquartile range.

John is an actuary, Suzanne is a statistician, Manuel analyzes social network data, and Jose is a computer programmer. Without knowing more about any of these practitioners, which one has at least some knowledge of insurance matters? A. Manuel B. Suzanne C. Jose D. John

D. John

Insurance Services Office, Inc. (ISO) collects personal auto information for which three broad categories of coverage? A. Standard, nonstandard, and defensive driver B. No-fault, other-than-collision, and collision C. Policy limits, driver, and deductible D. Liability, no fault, and physical damage

D. Liability, no fault, and physical damage

Sam is a data scientist who is trying to create a meaningful display of pricing data that has several points that are much smaller than the rest of the provided data. His manager has told him that for his project all data must be used. Given this information, which one of the following is the best choice of display for Sam to use? A. Network graph B. Interactive map C. Dashboard D. Log graph

D. Log graph

Grandview Insurance keeps customer records. Although different departments, such as claims, underwriting, operations, and marketing, each have their own view of how this data should be structured, an agreement was reached so that all records follow a standard format and contain the same elements. The organization found that this common agreement of a business definition is essential for its customer records. This agreement is an example of A. High-velocity data. B. A data dictionary. C. Big data. D. Master data.

D. Master data.

Millstone Insurance Company has many types of metadata to help describe and provide context for the large amounts of data it needs to collect and analyze. Its metadata that includes its refreshing cycles for migration and preservation of data is considered A. Technical metadata. B. Descriptive metadata. C. Business metadata. D. Operational metadata.

D. Operational metadata.

Which one of the following types of software can a data scientist use to enable R to read data directly from another source? A. Relational database B. SQL C. Excel D. Packages

D. Packages

Which one of the following steps undertaken by an analyst during a data review can be particularly helpful in detecting data anomalies? A. Determine data or metadata definitions B. Review prior data C. Identify questionable data values D. Perform exploratory analysis

D. Perform exploratory analysis

The following call to the qplot function is executed: qplot(a, b, facets = c ~ d) Which one of the following best describes the graph created by this code? A. Plots a in the y-axis against b in the x-axis in four graphs. The graph in the upper right corner will display only the data where c has its largest value and d has its smallest value. B. Plots a in the x-axis against b in the y-axis in four graphs. The graph in the upper right corner will display only the data where c has its largest value and d has its smallest value. C. Plots a in the y-axis against b in the x-axis in four graphs. The graph in the upper right corner will display only the data where c has its smallest value and d has its largest value. D. Plots a in the x-axis against b in the y-axis in four graphs. The graph in the upper right corner will display only the data where c has its smallest value and d has its largest value.

D. Plots a in the x-axis against b in the y-axis in four graphs. The graph in the upper right corner will display only the data where c has its smallest value and d has its largest value.

A data scientist is designing a SQL table to contain data about auto insurance policies. The data fields include policy number, premium, loss amounts, insured name, and territory. Which one of the data fields would the data scientist select as the primary key? A. Territory B. Premium C. Insured name D. Policy number

D. Policy number

If a data scientist needs to combine data from multiple tables in the same SQL database, he or she would use which one of the following types of SQL statements? A. INSERT B. Normalize C. Read data D. Query

D. Query

An advisory organization especially helps an insurer with which one of the following activities? A. Claims B. Marketing C. Loss control D. Ratemaking

D. Ratemaking

The first step in building a data set for triage modeling is to A. Gather data at both the time of triage and ultimate claims resolution. B. Prepare the data for analysis. C. Query, transform, join, and cleanse the data. D. Recognize the timing of the data valuation relative to its use.

D. Recognize the timing of the data valuation relative to its use.

Monique is a new analyst working for a clothing retailer of coats and jackets. She is trying to understand the effect of declining sales in states where warmer than expected weather was experienced over the entire winter. She has sales data from all the stores in the affected area. Which one of the following transformations will Monique use to gain a better understanding of the situation? A. Binning procedures to bin categorical variables B. Normalizing data in a database for efficiency C. Creating derived variables necessary for business reasons D. Smoothing the data using a statistical software program

D. Smoothing the data using a statistical software program

Which one of the following statements about snapshot data is most accurate? A. Database snapshots can be altered. B. Each set of snapshot data is stored on a different file server. C. Database snapshots cannot be queried for information. D. Snapshots exist on the file server until they are manually removed from the server.

D. Snapshots exist on the file server until they are manually removed from the server.

Which one of the following explains why social media data differs from other types of data? A. Social media data is rarely of any business value. B. Social media data cannot be analyzed. C. Social media data is typically inaccurate. D. Social media data focuses on relationships.

D. Social media data focuses on relationships.

Examples of technical metadata are A. Data field names and glossary terms. B. Data stewards and business rules. C. Scripts used to access data and the types of quality checks performed. D. Source table information and the table of contents.

D. Source table information and the table of contents. Technical metadata describes information about technology such as the ownership of the database, physical characteristics of a database, performance tuning (processors, indexing), table name, column name, data type, relationship between the tables, etc.

An insurer's business need for data obtained through statistical plans is mainly concerned with A. Market conduct. B. Regulatory requirements. C. Financial solvency. D. The cost and pricing of insurance.

D. The cost and pricing of insurance.

A data scientist decides to perform analysis of potential fraud by using the actual recordings of claimants' statements rather than transcripts of the statements. Which one of the following explains why a data scientist would decide on this method? A. Analysis of the recorded statements will better indicate how well the adjuster handled the process. B. It is impossible to detect potential fraud from a statement without actually hearing the recording. C. It is simpler to analyze voice recordings than it is to analyze the text in transcripts. D. The data scientist would like to use computer technology to analyze voice patterns that might indicate lying.

D. The data scientist would like to use computer technology to analyze voice patterns that might indicate lying.

When using UNION in Structured Query Language (SQL), A. The data types of each column need not be the same. B. This produces the same results as SUM. C. This produces the same results as GROUP BY. D. The datasets must have the same number of columns.

D. The datasets must have the same number of columns.

Quantiles are essential to visualizing distributions. Which one of the following statements is true of quantiles? A. The precise form of fi is important. B. No explicit rule is needed to compute q(f). C. A fraction f of the data is greater than q(f). D. The f-values provide a standard for comparison.

D. The f-values provide a standard for comparison.

Which one of the following is correct about exposure-related data? A. Policy inception and expiration dates cannot be considered exposure-related data. B. The measure of exposure does not vary by line of insurance. C. Deductibles and limits include insured value and payroll. D. The measure of exposure can include car years, insured value, payroll, and sales.

D. The measure of exposure can include car years, insured value, payroll, and sales.

In O'Neil's book Weapons of Math Destruction what concern does O'Neil express about the use of predictive modeling by insurance companies as they gather and analyze more information about policyholders? A. Policyholder gender is used as a predictor. B. The models will be able to help everyone pay the average cost. C. The models may help eliminate subjective bias in insurance prices that could result from using variables such as race and gender in rate estimation. D. The models be able to anticipate policyholder costs and charge for them in advance.

D. The models be able to anticipate policyholder costs and charge for them in advance.

The accounting date on Insurance Services Office, Inc. (ISO) statistical plans is A. Required for loss but not premium records. B. As the month, quarter, or year of a loss payment. C. Separate from the insurer's Statutory Page 14 data. D. The month and year a financial transaction was recorded.

D. The month and year a financial transaction was recorded.

A data set of observations quantifying mobile phone battery life is skewed toward large values. It is most likely that A. The distribution is well-approximated by the normal distribution. B. The values on a quantile plot are symmetric. C. The median and the mean measure the same aspect of the distribution. D. The quantile plot displays a convex pattern.

D. The quantile plot displays a convex pattern.

Regarding correlations between variables, A. Linear correlation exists only when both variables are increasing or both are decreasing. B. The demand function is an example of a linear and positive correlation between variables. C. A correlation of zero in a correlation matrix strongly indicates negatively correlated variables. D. The variables are positively correlated when each variable rises or falls with the other.

D. The variables are positively correlated when each variable rises or falls with the other.

Durham Insurance is a consolidation of many small carriers. A challenging data issue is that each of the carriers had its data stored and processed on its proprietary legacy platforms. Having a dedicated data governance committee will provide which one of the following advantages for Durham? A. Coordinated and unified data management B. Stakeholder representation C. Consolidated deployment of resources D. Timely and integrated systems

D. Timely and integrated systems Having a dedicated data governance committee will provide timely and integrated systems, so its many stakeholders will all have current and relevant data.

According to Peng, the goal of exploratory data analysis is A. To create graphs that provide insights into the data. B. To understand how the data was produced. C. To identify questions about relationships between variables in the data. D. To get you thinking about your data and reasoning about your question.

D. To get you thinking about your data and reasoning about your question.

Which one of the following statements best explains the importance of transactional data? A. Transactional data is useful because its potential values are separated at equal intervals on a continuum. B. An insurer can view its transactional data in isolation as a means of analyzing its experience. C. Transactional data depicts a database's contents precisely as they existed at a particular moment. D. Transactional data captures the manner in which an element of the business changes.

D. Transactional data captures the manner in which an element of the business changes.

Which one of the following is true of unallocated loss adjustment expenses (ULAE)? A. ULAE represents an expense to settle one particular claim. B. ULAE is tracked at the individual policy level. C. ULAE is specific to one policy. D. ULAE may not be specific to any one product.

D. ULAE may not be specific to any one product.

Each Unicode codepoint has a(n) A. Specific number of code spaces. B. Different character string. C. Associated dingbat. D. Unique binary value.

D. Unique binary value.

Which one of the following types of variables would be most appropriately represented graphically by a set of distinct colors? A. Ordered categorical B. Continuous numerical C. Discrete numerical D. Unordered categorical

D. Unordered categorical

An independent state bureau is examining data in a unit statistical plan (USP). The USP indicates the underwriting experience (premiums and losses) collected for which type of insurance? A. Surety B. Personal auto C. General liability D. Workers compensation

D. Workers compensation

Quantitative understanding is gained by A. Talking to various professionals. B. Having qualitative understanding of the data. C. Testing all the data. D. Working on the details of the data itself.

D. Working on the details of the data itself.

What command will turn each data point the color indicated by the variable Color? A. ggplot(data = DataFrame, mapping = aes(x = X_Values, y = Y_Values)) + geom_point(color=Color) B. ggplot(data = DataFrame, mapping = aes(x = X_Values, y = Y_Values)) + geom_point(mapping=Color)) C. ggplot(data = DataFrame, mapping = aes(x = X_Values, y = Y_Values)) + geom_point(color="Color") D. ggplot(data = DataFrame, mapping = aes(x = X_Values, y = Y_Values, color = Color)) + geom_point()

D. ggplot(data = DataFrame, mapping = aes(x = X_Values, y = Y_Values, color = Color)) + geom_point()

You have a data frame, named rsv_chgs, with a column called TRANSACTION_YEAR and another column called RESERVE_CHANGE. You would like to calculate the total of the RESERVE_CHANGE column for each TRANSACTION_YEAR. Which one of the following codes will calculate those numbers for you using the dplyr package? A. summarize(rsv_chgs, 'Total Change' = sum(RESERVE_CHANGE)) %>% group_by(TRANSACTION_YEAR) B. group_by(rsv_chgs, TRANSACTION_YEAR) %<%summarize(rsv_chgs, 'Total Change' = sum(RESERVE_CHANGE)) C. summarize('Total Change' = sum(RESERVE_CHANGE)) %<% group_by(rsv_chgs, TRANSACTION_YEAR) D. group_by(rsv_chgs, TRANSACTION_YEAR) %>%summarize('Total Change' = sum(RESERVE_CHANGE))

D. group_by(rsv_chgs, TRANSACTION_YEAR) %>%summarize('Total Change' = sum(RESERVE_CHANGE))

Which one of the following is correct regarding the APIs that data scientists use? A. Only the United States government allows APIs to be used on its websites. B. Custom APIs are permitted if they conform to the website owner's guidelines. C. The only APIs available are those designed and permitted by a website's owner. D. There are no specific rules or guidelines regarding how APIs can be used.

B. Custom APIs are permitted if they conform to the website owner's guidelines.

Data and process flow diagrams are especially helpful for which one of the following? A. Master data B. Data lineage C. Big data D. Metadata repository

B. Data lineage

How are data mining goals different from business objectives? A. Business objectives should be used instead of data mining goals. B. Data mining goals state project objectives in technical terms. C. Data mining goals state project objectives in business terminology. D. Data mining goals are the same as business objectives.

B. Data mining goals state project objectives in technical terms.

Which one of the following is true of data science? A. Data science is no longer considered an experimental pursuit. B. Data science and its methods evolve rapidly. C. Data science is another name for the scientific method. D. Data science encompasses data processing but not data analytics.

B. Data science and its methods evolve rapidly.

The skill set found in the intersection of mathematics and statistics, computer programming, and domain knowledge is most important to a(n) Select one: A. Mathematician. B. Data scientist. C. Actuary. D. Computer programmer.

B. Data scientist.

Data governance strives to A. Encourage siloed decision making. B. Encourage enterprise-wide data management. C. Discourage handling of big data. D. Discourage cross-functional communication.

B. Encourage enterprise-wide data management.

Grandview Insurance has recently created a Data Dashboard to provide a snapshot from a data perspective of each of its departments. It shows the relationships and interdependencies of its data stakeholders and serves as a type of blueprint for its data organization. Their Data Dashboard is an example of which of these data governance tools? A. Collaboration tool B. Enterprise data model C. External policies and procedures D. Internal policies and procedures

B. Enterprise data model

Measurements of information gain are based on A. Target variables. B. Entropy. C. The chi-squared statistic. D. Classification trees.

B. Entropy.

Test data construction should A. Be as simplified as possible. B. Err on the side of being complex rather than simplified. C. Include all the data in a large database. D. Allow quantitative, but not qualitative, understanding.

B. Err on the side of being complex rather than simplified.

The inception date on the Insurance Services Office, Inc. (ISO) statistical plan A. Is used to assign premium and loss records to the proper accident year. B. Establishes the codes that are in effect at a particular point in time. C. Corresponds to the policy effective date and is required only for premium transactions. D. Is equivalent to the date of loss and required only for loss transactions.

B. Establishes the codes that are in effect at a particular point in time.

The primary purpose of statistical plans is to A. Ensure that every captured data element is used in the rating process. B. Facilitate aggregation of historical insurance statistics. C. Provide individual standards to different insurers for data collection. D. Make sure that a policy is rated in one process and coded in another.

B. Facilitate aggregation of historical insurance statistics.

The purpose for the development of JSON was to A. Make Ecma the international standard for programming languages. B. Facilitate the interchange of structured data among all programming languages. C. Create one universal programming language. D. Make Unicode the international standard for all programming languages.

B. Facilitate the interchange of structured data among all programming languages.

An analyst is reviewing key-value pairs, the basis for storage and retrieval of information in a hash table. Which one of the following, based on the requirements for keys and values, is an appropriate pair? A. Color is the key, and fifty is the value. B. Gender is the key, and male is the value. C. Fifty is the key, and age is the value. D. Denver is the key, and city is the value.

B. Gender is the key, and male is the value.

A multivariate display that uses colors effectively either in different boxes or squares, or in a more fluid design, is a A. Pivot table. B. Heat map. C. Contingency table. D. Correlation matrix.

B. Heat map.

According to Loukides st al., good case studies give us the opportunity to think through problems before facing them in real life. There are four case studies per Princeton's Center for Information Technology Policy - Automated Healthcare App, Dynamic Sound Identification, Optimizing Schools, and Law Enforcement Chatbots. Which one of the following statements best described the issues discussed in the Automated Healthcare App case study? I. It also raises the issue of how decisions to use data are made. Who needs to be informed about the decisions made and what are the consequences when people find out how their data has been used. II. It raises issues like paternalism, consent, and even language choices. III. It raises issues about the trade-off between liberty and security, entrapment, openness and accountability, and compliance with international law. IV. It uncovered an online marketplace in which cybercriminals advertised and traded stolen identities across state line. A. I B. II C. III D. IV

B. II

An approach to managing problems associated with large amounts of data is A. Increasing the amount of normalization. B. Inserting bulk data into a table in the order of the primary key. C. Referring to an index many times to speed up the process. D. Organizing the data by many different indexes and columns.

B. Inserting bulk data into a table in the order of the primary key.

Which one of the following statements best characterizes the disadvantage of comparing objects by their hash values? A. It cannot avoid collisions. B. It requires computing the hash value for each object. C. It cannot detect possible file corruption. D. It is ineffective for situations in which the elements of a set of objects will be compared with each other repeatedly.

B. It requires computing the hash value for each object.

An analyst, Joel, is preparing and cleaning collected raw data for analysis and later use in visual displays. Which one of the following will facilitate his efforts? A. Because his audience is unfamiliar with this data, Joel introduces his spreadsheet with two paragraphs of clarifying instructions. B. Joel adds an additional final column for consolidating individual cell annotations. C. Joel wants his audience to understand that he has very little data to work with and so leaves many cells empty. D. Joel recognizes that field headers, no matter the length, are imperative to fully describe each of the variables.

B. Joel adds an additional final column for consolidating individual cell annotations.

Which one of the following ways to deal with missing values involves deleting an instance/record that has any missing data? A. Pairwise deletion B. Listwise, or casewise, deletion C. Dummy variables D. Imputation

B. Listwise, or casewise, deletion

Which one of the following is true regarding the loss amount, one of the amount fields on the Insurance Services Office, Inc. (ISO) statistical plan? A. Loss amounts are reported net of reinsurance. B. Loss amounts are reported net of salvage and subrogation. C. Paid and outstanding losses for each claim are included together in one transaction. D. Outstanding loss amounts include reserves for incurred but not reported losses.

B. Loss amounts are reported net of salvage and subrogation.

Which one of the following is true regarding data anomalies? A. Outliers can be random or systematic. B. Many variables in insurance datasets are noisy. C. Missing data can only result when values are not supplied. D. Noisy data is another name for outliers.

B. Many variables in insurance datasets are noisy. Anomalies can be random or systematic?

Which one of the following can help with validating that the dataset has the right data? A. Target variables B. Metadata C. Telematics D. Projected severity

B. Metadata

When a database is eventually used for modeling, both the training data and the test data A. Must have binary target variables. B. Must have known values for the model's target variable. C. Should be simple. D. Will be used to test the model.

B. Must have known values for the model's target variable.

Which one of the following is a problem a data scientist may encounter when preparing unstructured data for analysis? A. Volume B. Noise C. Variety D. Velocity

B. Noise

Which one of the following is true regarding hash functions? A. Hash functions have one simple role in data science. B. Output values are also called digests or hash values. C. Common hash functions have output values that are negative integers. D. Hash functions only accept certain types of data of limited lengths.

B. Output values are also called digests or hash values.

Emilio is a claims adjuster working for a small insurance carrier. He is responsible for setting the reserves on claims when they are first reported. For which one of these scenarios would Emilio rely on automation to make a reliable prediction for reserves? A. A bodily injury claim that is first reported as a lawsuit B. Physical damage claims from two of Emilio's insureds involved in one accident C. An auto liability claim with conflicting accounts from witnesses D. An accident resulting in two deaths and one seriously injured claimant

B. Physical damage claims from two of Emilio's insureds involved in one accident Emilio would rely on automation to make a reliable prediction of reserves for the physical damage claims from two of Emilio's insureds involved in one accident. The other scenarios suggest additional variables and longer resolution times that would make routine reserving by automation less predictable.

Which approach to data analytics involves developing a method that can be used repeatedly to facilitate data-driven decision making? A. Traditional approach B. Predictive approach C. Descriptive approach D. One-time approach

B. Predictive approach

Jason is a data scientist gathering data about his company's customer service response times. He is trying to decide how to present his findings, which show excellent results for eleven months but strikingly poor ones for the month of November. Averaging the yearly response times will lower the otherwise stellar performance of this department, so Jason decides to present the results by month with a note that November's three snowstorms and flu epidemic took its toll on availability of employees and thus response times. Which category of data quality is Jason particularly exemplifying in his decision? A. Validity B. Reasonability C. Accuracy D. Completeness

B. Reasonability

Data reduction can be accomplished by A. Documenting data preparation work. B. Removing variables from the modeling dataset that have no predictive value. C. Creating a training sample. D. Testing the model on holdout data.

B. Removing variables from the modeling dataset that have no predictive value.

Bar graphs typically A. Show the independent variables on the y-axis on a vertical bar graph. B. Reveal patterns in categorical data by showing relative size, colors, or ranges. C. Use varying widths for the bars within a graph with no spaces between the bars. D. Show the dependent variable values on the x-axis on a vertical bar graph.

B. Reveal patterns in categorical data by showing relative size, colors, or ranges.

Which one of the following is true regarding querying data using Structured Query Language (SQL)? A. Few other programming languages include SQL capability. B. SQL has become a standard for relational databases. C. SQL is widely used to work with data in any type of database. D. SQL is flexible without the rules and syntax required of other languages.

B. SQL has become a standard for relational databases.

Which one of the following is an aggregate function used in Structured Query Language (SQL) to aggregate functions? A. JOIN B. SUM C. SELECT D. WHERE

B. SUM

If the goal of data mining is known and used in a model, the process is called A. Unsupervised statistics. B. Supervised learning. C. Descriptive modeling. D. Entropy.

B. Supervised learning.

Which one of the following entities offers analysis of filings truckers make with the United States Department of Transportation? A. Equifax B. The Central Analysis Bureau C. Dun & Bradstreet D. Experian

B. The Central Analysis Bureau

Which one of the following is true regarding the Gramm-Leach-Bliley (GLB) Act? A. For purposes of the GLB Act, a consumer is someone who has an ongoing relationship with the institution. B. The GLB Act aims to protect information collected about individuals, but not about businesses. C. The GLB Act applies to financial institutions, such as tax-preparation services, but not to insurance. D. As an only option, the GLB Act requires customers to write a letter to opt out of having information shared with certain parties.

B. The GLB Act aims to protect information collected about individuals, but not about businesses.

Which one of the following is true regarding statistical plans in general? A. Large insurers with much available data especially rely on generated statistical plan data to make credible pricing decisions. B. The database created by aggregating historical insurance statistics is used to analyze insurance costs. C. A statistical plan is an informal set of directions for reporting various items to a statistical agent. D. Using the code-as-rated approach, data elements used to rate the policy are readily available but hard to verify.

B. The database created by aggregating historical insurance statistics is used to analyze insurance costs.

Raul is an insurance adjuster handling many claims. Because of security concerns, his organization has issued strict guidelines for maintaining his policyholders' and claimants' personally identifiable information (PII). One claimant recently discovered that his PII, only disclosed because of his recent accident, had been compromised. Raul was not found to be remiss in his security efforts. Which other party is most likely responsible? A. The data scientist compiling the policy information for analysis B. The intermediary who first took the claimant's loss information C. The producer working solely as a consultant D. An agency in the process of premium collection

B. The intermediary who first took the claimant's loss information The intermediary who first took the claimant's loss information is mostly likely responsible, as he or she would need to safeguard the PII along with the adjuster. In the scenario described, the data scientist, producer, or agency would not have had the PII specific to the one recent accident.

Which one of the following is an example of a piece of quantitative data? A. An assessment of an auto insured's likelihood of being involved in an auto accident B. The number of auto insureds in a particular region C. Five auto insureds arranged according to their likelihood of being in an accident D. The color of an auto insured's vehicle

B. The number of auto insureds in a particular region

Which one of the following is true regarding indexes, one of the features of Structured Query Language (SQL)? A. A difficulty for data scientists is that few database servers allow users to look at indexes. B. The optimization feature automatically selects the most effective index. C. A primary key is considered effective for any large table. D. Small tables generally require indexes.

B. The optimization feature automatically selects the most effective index.

In Healy, there is a case study related to a slide created to evaluate Marissa Mayer's performance as CEO of Yahoo. What is the biggest problem noted about this slide in the text? A. Dual axes can be scaled to misrepresent the association in the variables B. The overall message of the slide is unclear. C. For this topic it is not appropriate to have time on the x axis. D. The color theme doesn't align with best practices for preattentive processing.

B. The overall message of the slide is unclear.

Which one of the following best describes the table structure in a relational database? A. Each column in the table represents a unique instance of data. B. The tables in a relational database have a logical connection to each other. C. Each row in the table corresponds to a data field. D. Insurers exclusively make use of relational databases.

B. The tables in a relational database have a logical connection to each other.

Which one of the following is true regarding the transaction effective date and transaction expiration date on Insurance Services Office, Inc. (ISO) statistical plans? A. The transaction effective date always corresponds to the inception date. B. The transaction expiration date is reported on premium records only. C. The transaction effective date is reported on loss records only. D. The transaction expiration date is always the date coverage is canceled.

B. The transaction expiration date is reported on premium records only.

Which one of the following is the best order for building a plot by layer, as described by Healy? A. Tidy data, geom, mapping, co-ordinates & scales, labels & guides B. Tidy data, mapping, geom, co-ordinates & scales, labels & guides C. Tidy data, mapping, geom, labels & guides, co-ordinates & scales D. Tidy data, geom, mapping, labels & guides, co-ordinates & scales

B. Tidy data, mapping, geom, co-ordinates & scales, labels & guides

When defining types of data anomalies, A. The threshold for determining what is considered an outlier is consistent for any project. B. Truncation may refer to a type of data anomaly and also be considered a treatment. C. Systematic error results from unknown causes and may be accounted for within an acceptable range. D. Random error has an observable pattern with an identifiable cause and applicable remedy.

B. Truncation may refer to a type of data anomaly and also be considered a treatment.

Web scraping transforms A. Small amounts of data from the internet. B. Unstructured data into structured data. C. Structured data into relational databases. D. Internet data into a library.

B. Unstructured data into structured data.

Claims representative Simone is investigating a recent claim with differing accounts from the insured, the claimant, and several witnesses to an accident. In reviewing some of the claimant's social media postings, Simone observed that the claimant bragged about his propensity to speed and his many unpaid parking and speeding violations, prompting Simone to look further into his past. This new information coming to Simone is an example of A. Structured external data. B. Unstructured external data. C. Unstructured internal data. D. Structured internal data.

B. Unstructured external data.

Which one of the following Structured Query Language (SQL) features is sometimes used to create a function in a different language? A. Normalization B. User-defined functions C. Null values D. Optimization

B. User-defined functions

Which one of the following refers to the number of changes in a claim? A. Identity B. Velocity C. Mass D. Acceleration

B. Velocity

With an EQUAL JOIN statement, which one of the following is used to denote the conditions that specifically limit the data to be retrieved (such as greater than a loss of $10,000)? A. JOIN B. WHERE C. FROM D. SELECT

B. WHERE

A claims triage model predicts A. The granularity of data. B. Which claims will be more complex to handle. C. Claims professionals' opinions about the value of the data. D. Fraudulent and questionable claims.

B. Which claims will be more complex to handle.

Which one of the following is correct regarding XML? A. XML uses tags to denote specific formats. B. XML tags provide metadata. C. XML has a limited, predefined set of tags. D. XML does not use tags for markup.

B. XML tags provide metadata.

Which one of the following not only finds matches using Perl but can also replace each match found? A. Split function B. s / / / operator C. Alternating matches D. Grep function

B. s / / / operator

You have developed a graph of a scatter plot using the ggplot2 plotting system. The y-axis currently goes from 0 to 1,000, however, all of the interesting information lies in the range where y is between 0 and 42. Which one of the following can be added to your code for the existing graph plot so that the y-axis and the data will be limited to those values between 0 and 42? A. + axis(y, 0, 42) B. + limit( yvar<42 and yvar>= 0 ) C. + ylim( 0, 42 ) D. + subset( yvar<= 42 and yvar>= 0 )

C. + ylim( 0, 42 )

Consider the following tests regarding the ordering of the predictive value of 1. Personality tests 2. Reference checks 3. Cognitive tests in determining a prospective employee's job performance as discussed in chapter six "Ineligible to Serve" of O'Neil's book Weapons of Math Destruction. What rank ordering is described? A. 1 < 3 <2 B. 3 <1 <2 C. 1 < 2 <3 D. 3 <2 < 1

C. 1 < 2 <3

If the regular expression of "/PD$/" appears in a string matching application, which one of the following would be a match? A. 20PD18 B. 2P018D C. 2018PD D. PD2018

C. 2018PD

An analyst receives the following data: 3, 9, 15, 23, 29, 29, 39. The mean, median, and mode are respectively A. 20, 21, and 39. B. 23, 29, and 39. C. 21, 23, and 29. D. 21, 23, and 30.

C. 21, 23, and 29.

Which one of the following statements is true regarding color used in data visualizations? A. Color helps the audience effectively distinguish between small dots of various sizes. B. Several competing colors can add busyness and provide additional significance. C. A color, such as red, may mean different things to an international audience. D. Gray, green, and brown are considered warm colors on the color wheel.

C. A color, such as red, may mean different things to an international audience.

Imputation is A. An approach under which deletion is considered separately for each pair of variables. B. A way of deleting any instance/record that has missing data. C. A common method that data scientists use to fill in values for missing data based on the other data in the database. D. The process of creating a binary dummy variable to use as an independent variable in analysis.

C. A common method that data scientists use to fill in values for missing data based on the other data in the database.

One disadvantage of the base plotting system in R, when compared to the lattice and ggplot2 systems, is that A. Annotation in panels in plots is not especially intuitive and can be difficult to explain. B. It's not useful for much more than "quickly getting some data on the screen." C. A graph's layout must be planned out in advance to ensure all the features display as desired. D. Plots are entirely specified with a single function call, which can sometimes be very awkward.

C. A graph's layout must be planned out in advance to ensure all the features display as desired.

Which one of the following assigns a numeric value to states? A. Zip code tabulation area B. Census block group C. American National Standards Institute code D. Census tract

C. American National Standards Institute code

A variable value that cannot be filled because the real-world value does not exist is A. A clustered value. B. A missing value. C. An empty value. D. An imperfect value.

C. An empty value.

An inaccurate or invalid value is A. An outlier. B. Noisy data. C. An error. D. Missing data.

C. An error.

Which one of the following statements most accurately reflects a potential disadvantage to using external insurance data? A. External data cannot be used to produce loss development factors. B. Insurers must wait for loss data to mature before revising rates. C. As data analysts use larger quantities of data, they will encounter more unstructured data and data that needs to be "cleaned" before it can be used with the insurer's own data. D. External claims data is not useful for establishing rates for a new line of business.

C. As data analysts use larger quantities of data, they will encounter more unstructured data and data that needs to be "cleaned" before it can be used with the insurer's own data.

As a condition used with a WHERE statement, which one of the following is used to denote a range of values that includes the endpoints? A. = B. >= C. BETWEEN D. NOT

C. BETWEEN

A summary-based statistical plan A. Makes compilation of data by the statistical agent difficult and confusing. B. Collects additional detailed information to support the business need for data. C. Balances the regulatory need for information with the cost of compiling detailed data. D. Cannot include details beyond what is needed for regulatory purposes.

C. Balances the regulatory need for information with the cost of compiling detailed data.

Martina is an analyst using a robust measure called the trimmed mean to discover atypical values for a particular variable. Her calculation with a 10 percent trimmed mean would A. Use data from the highest 90 percent of the data points. B. Use data from the lowest 90 percent of the data points. C. Be made using 80 percent of the data points. D. Involve her selection of any 80 percent of the data points.

C. Be made using 80 percent of the data points.

Which one of the following best describes the category into which clickstream and internet user data fall? A. Demographic data B. Customer financial data C. Behavioral data D. Census data

C. Behavioral data

A telephone number in a SQL table would be which one of the following types of data? A. Numeric B. Boolean C. Character D. Temporal

C. Character

When data is portrayed through the use of a graph, certain data points will group together in A. Data cubes. B. Imputations. C. Clusters. D. Dummy variables.

C. Clusters.

To encode text using Unicode, A. Diacritics must be used to link the text to the Unicode website. B. The corresponding International Organization for Standardization characters must be used. C. Codepoints must first be located in the appropriate Unicode charts. D. Dingbats must be discovered and deleted from the text to ensure accurate coding.

C. Codepoints must first be located in the appropriate Unicode charts.

Managing data quality by always comparing the number of claims reported from each branch office with an independent audit report ensures that the data from the branch office fulfills the requirements of which category of data quality? A. Timeliness B. Reasonability C. Completeness D. Data lineage

C. Completeness

Which one of the following is an example of internal data? A. Demographic information from government databases B. Information obtained from census data C. Computer claims codes about a claimant's previous accidents D. Weather conditions to confirm or deny a lightning loss

C. Computer claims codes about a claimant's previous accidents

A data scientist is working on a project to identify current claims handling processes and opportunities for improvement. Which one of the following types of data would be most useful? A. Claims adjusters' file notes B. Recorded statements of claimants C. Computer logs of claims activities D. Videos of accident scenes and surveillance

C. Computer logs of claims activities

Which one of the following is true regarding statistical records? A. Premium and loss transactions are recorded by the same process. B. Statistical records are subject to federal rules for uniformity. C. Consistency in form facilitates aggregating files from other insurers. D. Statistical records create operational data.

C. Consistency in form facilitates aggregating files from other insurers.

Which one of the following statements most aligns Tufte's fundamental principles of analytic graphics? A. Data graphics should include text descriptions to help the reader interpret the plot. B. Data graphics and analysis should be driven by the tools and available software . C. Data graphics should be appropriately documented with labels, scales, and sources. D. Data graphics can aid in making your conclusions shine with clarity even when they are based on poor data.

C. Data graphics should be appropriately documented with labels, scales, and sources.

Millstone Insurance Company keeps finding errors in its reported auto physical damage loss amounts; some data reflects the insured's deductible reducing the amount paid, but other times, the deductible is not taken into account. Data scientists retrace the data by going back through the various locations and systems it touched. They discover that a new employee at a field office was responsible for data miscoding when claims were first reported. Millstone discovered the reason for the errors through A. Timely data. B. Acceptability thresholds. C. Data lineage. D. Meaningful metrics.

C. Data lineage.

The highlight of a display should be the A. Shading. B. Gridlines. C. Data. D. Borders.

C. Data.

A threshold classifier essentially makes a yes/no decision, putting things in one category or another. Loan decisions by different group were discussed by Wattenberg etc. Which one of the following strategy is in the paper? A. True positive rate is that of the people who can pay back a loan the same fraction in each group should actually be granted a loan. B. Group-unaware requires the threshold to be the same for each group. C. Demographic parity picks for each group a threshold such that the fraction of non-defaulting group members that qualify for loans is the same. D. Max Profit will pick for each group the threshold that maximizes profit.

C. Demographic parity picks for each group a threshold such that the fraction of non-defaulting group members that qualify for loans is the same.

The interquartile range is a common measure of A. Central tendency. B. Standard deviation. C. Dispersion. D. The third moment.

C. Dispersion.

An important component of the parsing process for text data is A. Querying a database to obtain the data. B. Analyzing the text data. C. Eliminating unnecessary stopwords. D. Writing a program to parse the text data.

C. Eliminating unnecessary stopwords.

Which one of the following accurately reflects the frequency with which the United States census is conducted? A. Every year B. Every five years C. Every ten years D. Every twenty years

C. Every ten years

Which one of the following is correct regarding the use of Excel to analyze data? A. Excel is suitable for large projects, but not for smaller projects. B. Excel cannot be used to analyze data. C. Excel is suitable for smaller projects, but not for large projects. D. Excel can be used to analyze any type of data.

C. Excel is suitable for smaller projects, but not for large projects.

The simplest way to combine data from two tables to retrieve needed data using Structured Query Language (SQL) is to include the names of both tables in which one of the following statements in the query? A. SELECT. B. WHERE. C. FROM. D. EQUAL JOIN.

C. FROM.

The regulation whose goals sometimes conflict with other goals, such as the right of privacy, national security, and criminal investigations, is most noteworthy for the A. Gramm-Leach-Bliley (GLB) Act. B. Health Insurance Portability and Accountability Act of 1996 (HIPAA). C. Freedom of Information Act (FOIA) of 1966. D. Privacy Act of 1974.

C. Freedom of Information Act (FOIA) of 1966.

Grandview Insurance, a newly formed organization, is collecting the necessary data to send to its state insurance regulator and wonders whether its effort will pay off over time. Which one of the following will Grandview realize as an advantage to filing its statistical plan data? A. Grandview can file the data in any format it desires, including any fields it feels are necessary, thus ensuring little disruption to its normal business. B. Grandview will find that the statistical plan data fields are the same as those that Grandview deems most essential and that reflect Grandview's particular needs and preferences. C. Grandview will be forced to keep current records and be aware of new codes and changes in variables so that its data will not be rejected and have to be reworked. D. Grandview's audio and video recordings and adjusters' notes can be sent in their current formats, facilitating the availability of this information to other insurers.

C. Grandview will be forced to keep current records and be aware of new codes and changes in variables so that its data will not be rejected and have to be reworked.

By performing a query to obtain data from a policy table and a claims data table without using Structured Query Language (SQL),a data scientist A. Efficiently processes data for thousands of policies and claims. B. Performs a query for each table, saving time and effort. C. Has to match and combine information from the tables. D. Can seamlessly combine the tables together.

C. Has to match and combine information from the tables.

Which one of the following statements, according to Loukides et al., cannot be described as five C's? I. Agreement about what data is being collected and how that data will be used II. Users must have clarity about what data they are providing, what is going to be done with the data, and any downstream consequences of how their data is used III. An organization that exposes user data can do so if they have best intentions IV. It's often impossible to reduce the amount of data collected, or to have data deleted later A. I B. II C. III D. IV

C. III

Bob's team is creating a dataset for an underwriting model. Bob urges his team to document how they perform the data preparation work. Which one of the following is a reason that this documentation is important? A. The team needs to know what they are trying to predict—in other words, the target variable. B. The team will need to refresh the training model. C. If the model is implemented, the data used in production will need to be transformed and binned in the same way as the data used in the training model. D. Policy data will need to be joined to claims data to predict losses at the policy level.

C. If the model is implemented, the data used in production will need to be transformed and binned in the same way as the data used in the training model.

Mark is a data scientist working on preparing data for a next-best-action model. Which one of the following should he understand about using data from claims notes? A. The responsibility for the data quality of claims notes falls on the claims professional. B. Claims notes typically don't provided interesting data for next-best-action models. C. In claims notes, most claims professionals use acronyms, abbreviations, and slang that will be foreign to a data scientist. D. The text in claims notes is considered structured text.

C. In claims notes, most claims professionals use acronyms, abbreviations, and slang that will be foreign to a data scientist.

A perfectly symmetric normal distribution has a familiar bell-shaped curve. Which one of the following statements is true regarding such a display? A. Its mean and median are the same values of zero, with the distribution's skewness of zero. B. The display could be used to represent a lognormal distribution or a Pareto distribution. C. Its mean and median are the same value, which is exactly in the middle of the data. D. Most insurance variables have symmetrical distributions as represented by this display.

C. Its mean and median are the same value, which is exactly in the middle of the data.

The method a typical computer program uses to structure video content is an algorithm that detects A. Motion. B. Files. C. Keyframes. D. Diagrams.

C. Keyframes.

Which one of the following string processing functions returns the number of characters in the string? A. LEFT B. CONCAT C. LEN D. RIGHT

C. LEN

Which one of the following is true regarding displays? A. Dual-scaled axes are meaningful for most audiences and should be used as often as possible. B. Many different icons and partial icons add depth and understanding to a pictogram. C. Legends are appropriate for charts with multiple bars, as directly labeling the bars can be difficult. D. Legends with heavy borders prominently displayed are generally the most appropriate.

C. Legends are appropriate for charts with multiple bars, as directly labeling the bars can be difficult.

Perl has functions that check for subpatterns. If a data scientist is looking for matches for the string "lung cancer," which regular expression would look for matches to "lung" only if it is followed by "cancer"? A. Backreference B. Left of the string C. Lookahead D. Lookbehind

C. Lookahead

Henry, a data scientist, is discovering many data quality issues in his predictive analytic modeling project. He has most recently run reports and analyses of a more routine actuarial nature. Which one of the following best explains why he is finding more quality issues in his current project? A. The role of data manager has not reached the recognition needed to ensure quality data for predictive modeling projects. B. Fewer data fields are used in predictive modeling which may not align the data used in the analysis. C. More data fields are used in predictive modeling, and some may have more quality issues than ones used in more routine analysis. D. Fewer resources are available for maintaining quality data in predictive modeling projects because of poor financial growth.

C. More data fields are used in predictive modeling, and some may have more quality issues than ones used in more routine analysis. Henry may be finding more quality issues in his current project because more data fields are used in predictive modeling, and some may have more quality issues than ones used in more routine analysis.

Which one of the following is unique to R as opposed to Perl concerning regular expressions? A. Greedy matches B. Backreference operators C. Nongreedy matches D. Lookahead operators

C. Nongreedy matches

The process by which a data scientist eliminates redundant data elements and compound columns in a SQL database is called A. Creating statements. B. Design. C. Normalization. D. Unique identification.

C. Normalization.

The advantage of operational data over statistical plan data is demonstrated best by which one of the following statements? A. High-quality standards are required of operational data. B. Operational data stays consistent with state-mandated codes. C. Operational data is proprietary and reflects an insurer's needs. D. Operational data is compatible with other insurers' collected data.

C. Operational data is proprietary and reflects an insurer's needs.

According to the Gramm-Leach-Bliley (GLB) Act, A. Pretexting, which is securing a customer's opt out online, is only allowed under certain conditions. B. Privacy notices must be sent through the mail or online, with neither method requiring an acknowledgement receipt. C. Opting out of information sharing with companies, such as those that provide data-processing services, is not available. D. Customers must only receive a privacy notice when his or her information is shared, subject to certain conditions.

C. Opting out of information sharing with companies, such as those that provide data-processing services, is not available.

Which one of the following is a method a data scientist could use to obtain data about website use? A. Query B. Geocode C. Packet sniffer D. Video content analytics

C. Packet sniffer

Which one of the following is true of both a summary-based statistical plan and a transaction-based statistical plan? A. Plans provide the actuary with information needed for rates and other studies. B. Individual claimant-level loss data is provided without a supplemental report. C. Plans collect sufficient detail for standard regulatory reporting purposes. D. Premium and loss transactional records are combined into a single record.

C. Plans collect sufficient detail for standard regulatory reporting purposes.

A data set consists of heights for third-, fourth-, and fifth-grade students. The fitted value for each grade is the mean height. Which one of the following approaches best describes how to evaluate whether grade distributions differ only in location? A. Plot the fitted heights for one grade on the vertical scale and the fitted heights for all grades on the horizontal scale to evaluate the pattern of residuals. B. Plot the fitted heights for one grade on the vertical scale and the quantiles of the residuals for all grades on the horizontal scale to evaluate the pattern of residuals. C. Plot the quantiles of the residuals for one grade on the vertical scale and the quantiles of the residuals for all grades on the horizontal scale to evaluate the pattern of residuals.Correct. Plot the quantiles of the residuals for one grade on the vertical scale and the quantiles of the residuals for all grades on the horizontal scale to evaluate the pattern of residuals. D. Plot the quantiles of the residuals for one grade on the vertical scale and the fitted heights for all grades on the horizontal scale to evaluate the pattern of residuals.

C. Plot the quantiles of the residuals for one grade on the vertical scale and the quantiles of the residuals for all grades on the horizontal scale to evaluate the pattern of residuals.

Which one of the following rating elements is captured in the exposure data element for homeowners insurance? A. Location of risk B. Public protection C. Policy limits D. Construction

C. Policy limits

When joining two tables, the unique identifier in the primary table and the unique identifier in the other table, to which the primary data is joined, are respectively the A. Primary key and secondary key. B. EQUAL JOIN and FULL OUTER JOIN. C. Primary key and foreign key. D. RIGHT JOIN and LEFT JOIN.

C. Primary key and foreign key.

R is a A. Relational database. B. Spreadsheet. C. Programming language. D. Data warehouse.

C. Programming language.

In Structured Query Language (SQL), an insurer's table of current auto policy customers has a column that indicates the date a premium payment was received. This data field is null for customers who are A. Choosing to remain with the insurer. B. Not purchasing the insurance. C. Purchasing but not yet paying for the insurance. D. Not reporting any claims.

C. Purchasing but not yet paying for the insurance.

Which one of the following is the best example of a census block border? A. State lines B. County lines C. Railroad tracks D. Zip code tabulation areas

C. Railroad tracks

If a data scientist wants to select specific columns in a SQL table to be displayed, he or she would use which one of the following statements? A. SELECT * B. * C. SELECT and column names D. SELECT and FROM

C. SELECT and column names

A team of data scientists wants to study and predict when policyholders will not renew their policies. They begin working with data from several insurers. Unfortunately, a miscommunication occurred, and the data scientists were provided only with data about policyholders who did not renew after one policy term. This is an example of A. Outliers. B. Imputation. C. Sampling bias. D. Missing values.

C. Sampling bias.

The Unicode characters for the languages of different regions are contained in A. Dingbats. B. Alphanumeric symbols. C. Script blocks. D. Bits.

C. Script blocks.

A chi-square test A. Shows that the null hypothesis can be rejected and thus there is no significant relationship between two variables. B. Is a complicated yet very accurate test to show causal relationships among variables. C. Shows that the null hypothesis is accepted and there is no significant relationship between two variables. D. Does not require the determination of whether the distribution of a variable is roughly equal in each cell.

C. Shows that the null hypothesis is accepted and there is no significant relationship between two variables.

Which one of the following transformations is useful for time series data to reduce the effect of the random variation and reveal underlying trends? A. Transforming to approximate normality B. Computing derived variables C. Smoothing the data D. Censoring a variable

C. Smoothing the data

Formatted text files are similar to A. Queries. B. Strings. C. Spreadsheets. D. Unicode.

C. Spreadsheets.

An insurer's operational data is usually available sooner than its A. Claims data and notes. B. Premium data. C. Statistical plan data. D. Producer data.

C. Statistical plan data. An insurer's operational data is usually available sooner than its statistical plan data. (Producer data, premium data, and claims data and notes are all examples of operational data.)

Which one of the following is an example of an interval scale measurement? A. Premiums collected during a policy year B. An assessment of an auto insured's likelihood of being involved in an auto accident C. Temperatures measured on the Celsius scale D. Five auto insureds arranged according to their likelihood of being involved in an auto accident

C. Temperatures measured on the Celsius scale

Although the terms "outlier" and "anomaly" are often used interchangeably, a distinction can be made A. That an outlier is a missing value, while an anomaly is an empty value. B. That an anomaly is a legitimate data point distant from the other data, while an outlier is an illegitimate data point. C. That an outlier is a legitimate data point distant from the other data, while an anomaly is an illegitimate data point. D. That an anomaly is a missing value, while an outlier is an empty value.

C. That an outlier is a legitimate data point distant from the other data, while an anomaly is an illegitimate data point.

Which one of the following organizations provides claims information at an individual person and/or property level? A. Insurance Services Office, Inc. B. The American Association of Insurance Services C. The Comprehensive Loss Underwriting Exchange D. A.M. Best

C. The Comprehensive Loss Underwriting Exchange

A data scientist uses a command in R to "read.csv." Which one of the following explains this command? A. The file to be read is a formatted text file with fixed width data fields. B. The file is to be parsed and converted to a structured version. C. The file to be read is in a delimited text file format with a comma as the delimiter. D. The file needs to be parsed, and all stopwords must be eliminated.

C. The file to be read is in a delimited text file format with a comma as the delimiter.

Data visualization communicates A. Field definitions of the data elements that are collected in insurance transaction processing. B. Summary statistics. C. The information that data provides through visual objects contained in graphs. D. Lines of code.

C. The information that data provides through visual objects contained in graphs.

If a report shows that data will not be valuable to a business, A. The organization should build a data table using a program such as SQL Server. B. It means there was not adequate metadata. C. The organization should not invest in modeling using that particular dataset. D. It means there is not enough data about a particular attribute.

C. The organization should not invest in modeling using that particular dataset.

Which one of the following is true regarding the use of points and lines in a graph? A. Trend lines are encouraged in scatterplots to add accuracy and clarity to the provided information. B. Axes' increments of three, four, eight, or nine are generally readily understood and improve understanding. C. The origin of the x-axis on a graph should typically be set to zero to avoid misleading the audience. D. If one line is to be emphasized, it should be a different color, with the unemphasized lines being black.

C. The origin of the x-axis on a graph should typically be set to zero to avoid misleading the audience.

A distribution is A. The interquartile range. B. A data analysis technique. C. The range of values in a dataset. D. A two-dimensional plot of point values.

C. The range of values in a dataset.

Without the use of hash functions for data segmentation, which one of the following is more likely to occur? A. The split will be easily reproducible. B. Replication of analysis is facilitated. C. The split between datasets will be biased. D. The split between datasets will be unbiased.

C. The split between datasets will be biased.

Testing data can help ensure that the data adheres to A. Retrospective rating policies. B. An overly simplified data understanding that is applicable to the real world. C. The standards of atomicity, consistency, isolation, and durability. D. The defects in the database.

C. The standards of atomicity, consistency, isolation, and durability.

Which one of the following is true regarding the Privacy Shield agreement between the European Union (EU) and the United States (U.S.)? A. The EU regards the agreement as providing less protection than Safe Harbor. B. Methods of redress, not previously covered, are now agreed upon. C. This agreement includes added regulatory and enforcement requirements for the U.S. D. Countries can have their privacy and security standards validated.

C. This agreement includes added regulatory and enforcement requirements for the U.S.

Which one of the following is an activity that is part of the Describe Data task? A. Form a hypothesis and identify actions. B. Identify missing attributes and blank fields. C. Understand the meaning of each attribute and attribute value in business terms. D. Specify data selection criteria.

C. Understand the meaning of each attribute and attribute value in business terms.

Concerning data scientists and null values in Structured Query Language (SQL), A. Awareness of the null default is unimportant. B. Tables can only be designed using the null default. C. Unknown data at the present time may be considered null. D. "Null" indicates, without exception, no value for a particular data field.

C. Unknown data at the present time may be considered null.

The written premium collected on each transaction record is A. The same as the on-level premium for all lines of insurance. B. Only reported if has been billed by the insurer and paid by the insured. C. Used for reconciling the reported statistical data with Statutory Page 14 data. D. The direct premium charged for coverage taking into account related reinsurance transactions.

C. Used for reconciling the reported statistical data with Statutory Page 14 data.

The descriptive approach to data-driven decision making A. Is exemplified by automated underwriting for personal auto insurance. B. Maximizes the use of automation for reliable predictions time after time. C. Uses classical analytical and visualization tools as well as machine learning. D. Involves identifying and implementing analysis that will be repeated.

C. Uses classical analytical and visualization tools as well as machine learning.

Which one of the following is true in regard to using analytic tools to identify atypical values for a particular variable? A. A common formula for standardizing a variable is to subtract the mean and multiply by the standard deviation. B. The rule of thumb that values greater or less than three standard deviations from the mean is particularly applicable for heavy-tailed insurance data. C. Variance and standard deviation are scale dependent and increase as the scale of a variable increases without the relative variability increasing. D. An unusually narrow range (given the number of values) or few extreme minimum or maximum values will suggest the presence of outliers.

C. Variance and standard deviation are scale dependent and increase as the scale of a variable increases without the relative variability increasing.

Which one of the following statements is true regarding Structured Query Language (SQL) advanced topics? A. With all implementations of SQL, user-defined functions (UDFs) require a different language. B. Multiple columns can be used in designating a primary key, but not used in creating an index. C. When multiple columns are designated in an index, SQL will search to the order in which the columns are listed. D. A growing number of unique indexes helps guarantee efficient database performance.

C. When multiple columns are designated in an index, SQL will search to the order in which the columns are listed.

You notice that when you imported some data into R, you imported a field named accdate as a character type, but the values are all numeric and contain a four digit year followed by a 2 digit month and 2 digit day. Which one of the following will properly convert accdate into R's date format? A. as.character(accdate,"%Y%m%d") B. as.character(as.Date(accdate),"%Y%m%d") C. as.Date(accdate,"%Y%m%d") D. as.Date(as.character(accdate),"%m%d%Y")

C. as.Date(accdate,"%Y%m%d")

You are exploring the relationship between industry (a categorical variable identifying the industry a stock company is in) and market_cap (the market capitalization of a stock company). Both fields are in a data frame named stock_cos. Which one of the following choices will create a box plot displaying percentiles of market capitalizations by industry? A. boxplot(industry, market_cap, stock_cos) B. boxplot(market_cap, industry, stock_cos) C. boxplot(market_cap ~ industry, stock_cos) D. boxplot(industry ~ market_cap, stock_cos)

C. boxplot(market_cap ~ industry, stock_cos)

You are creating a graph that will have a large number of overlapping lines. Which one of the following choices would best specify the color of the lines so that the density of the lines can been seen on a white background? A. col=rgb(1,1,1,0.2) B. col=rgb(0,0,0,1) C. col=rgb(0,0,0,0.2) D. col=rgb(1,1,1,1)

C. col=rgb(0,0,0,0.2)

What is entropy?

A measure of how unpredictable something is

Which one of the following regular expressions is used to denote any character? A. "." B. "\d" C. "\S" D. "\"

A. "."

Which one of the following is, in many languages, the escape character needed for a program to recognize special characters as legitimate? A. "\" B. "/" C. "$" D. "^"

A. "\"

Let x(1), ...x(n) represent data set A, ordered from smallest to largest. Let y(1), ...y(n) represent data set B, ordered from smallest to largest. A q-q plot is constructed to compare distributions A and B. Distribution A is assigned to the horizontal axis and distribution B is assigned to the vertical axis. The points graphed on the panel consistently lie below the line A = B. C is a constant. Which one of the following relationships is most likely true? A. A - B = C B. A + B = C C. B - C = A D. B - A = C

A. A - B = C

If data is exchanged with JSON, A. A data scientist can then analyze or model the data directly. B. A data scientist must convert it into a programming language to manipulate the data. C. A data scientist is then unable to perform any analysis on the data. D. A data scientist then needs to use XML to structure the data.

A. A data scientist can then analyze or model the data directly.

A measure of exposure for general liability insurance is A. A fixed exposure base, such as square footage of store space. B. The car-years or car-months generated by various transaction dates. C. The house-years based on the implied policy term. D. The amount-of-insurance years determined by coverage amount.

A. A fixed exposure base, such as square footage of store space.

Which one of the following is true regarding hash functions for hash tables? A. A hash table allows for fast information storage and retrieval. B. In a dictionary, values are words and keys are definitions. C. Keys as well as values are generally short. D. Keys as well as values must be unique.

A. A hash table allows for fast information storage and retrieval.

Regarding linear and nonlinear correlations, A. A monotonic relationship involves variables that steadily and smoothly increase or decrease, but not both. B. A nonmonotonic relationship between variables is a nonlinear one without abrupt reversals in fluctuation. C. A nonmonotonic relationship may use a linear relationship model to approximate the relationship. D. A monotonic relationship between variables is a linear one with abrupt reversals in fluctuation.

A. A monotonic relationship involves variables that steadily and smoothly increase or decrease, but not both.

When using a power transformation to adjust non-normal variables, the effectiveness of the transformation depends on the selection of an appropriate T parameter. Which one of the following statements regarding the selection of the T parameter is true? A. A trial and error method can be used to identify the value of the T parameter that will be most effective. B. For data sets with zeroes, a power transformation with a T parameter less than zero will be most effective. C. If the ratio of the largest observation to the smallest observation of a data set is very close to 1, power transformations with T from -1 to 1 have a large effect. D. For a highly skewed distribution, a power transformation with the parameter T = 1 will be most effective.

A. A trial and error method can be used to identify the value of the T parameter that will be most effective.

Data quality A. Allows actuaries to focus more on issues that maximize profits. B. Is almost always compromised because of clerical errors. C. Has one definition that is applicable to all its various users. D. Has not been highlighted to the point of establishing data managers.

A. Allows actuaries to focus more on issues that maximize profits.

Which one of the following can be used for web scraping? A. Beautiful Soup B. Application Programming Interface C. XML D. HTML

A. Beautiful Soup

Which one of the following can a data scientist use to read web languages and load their data into Python? A. Beautiful soup B. R C. HTML D. SQL

A. Beautiful soup

The Unicode Standard defines how the value of each character is represented in A. Bits. B. Dingbats. C. Technical symbols. D. Diacritics.

A. Bits.

John Tukey is particularly recognized for his contribution to development of the A. Boxplot. B. Vertical bar graph. C. Histogram. D. Gaussian function.

A. Boxplot.

Which one of the following best describes the manner in which the United States Postal Service identifies mail delivery routes? A. By zip code B. By American National Standards Institute code C. By census tract D. By census block

A. By zip code

Using Structured Query Language (SQL), a statement that will return a table with the specific number of policies in a database should particularly contain which one of the following? A. COUNT B. GROUP BY C. FROM D. SELECT

A. COUNT

Which one of the following is true regarding data quality? A. Capturing enough data to generate statistically significant results generally leads to quality data. B. The "fitness" of data is an unchangeable standard regardless of its end users. C. Having quality and accurate data from the start assures it will remain in that condition. D. Fair and accurate insurance rates are unrelated to the quality of the data used in their development.

A. Capturing enough data to generate statistically significant results generally leads to quality data.

Which one of the following statements is correct regarding census tracts? A. Census tracts are small, relatively permanent geographic entities within counties. B. The American National Standards Institute (ANSI) assigns each census tract a four-digit code. C. A number of counties within a geographic area are combined to form a census tract. D. Census tracts are assigned a multi-digit numerical zip code.

A. Census tracts are small, relatively permanent geographic entities within counties.

A data scientist is selecting claims data for a test sample. Which one of the following best explains why the data scientist chooses claims with the highest number of transactions? A. Claims with the highest number of transactions are likely to provide the data scientist with more complicated life cycles to examine, which, in turn, will help the data scientist prepare for unexpected data and data idiosyncrasies. B. Claims with the highest number of transactions will represent how and from whom the data was collected. This will help the data scientist determine whether he or she is making accurate assumptions about policyholders. C. Claims with the highest number of transactions will have come from different sources and need less adjustment in the database. D. Claims with the highest number of transactions will have a known value for any model's target variable.

A. Claims with the highest number of transactions are likely to provide the data scientist with more complicated life cycles to examine, which, in turn, will help the data scientist prepare for unexpected data and data idiosyncrasies.

Insurance regulators study an insurer's National Association of Insurance Commissioners Annual Statement to A. Collect data about the insurer's assets, liabilities, and financial performance. B. Get the detail necessary to match losses and premiums for comparable groups of insurance policies. C. See a snapshot of an insurer's financial condition at the close of each month of business. D. Test the adequacy of rates and classification plans to ensure that rates meet statutory standards.

A. Collect data about the insurer's assets, liabilities, and financial performance.

In a scatterplot display, a line can be plotted to show the relationship between the variables plotted. Which one of the following statements best describes what this line reveals? A. Confidence intervals can be plotted to display the strength of the relationship given the existing noise. B. A forward-moving line from left to right shows a negative correlation between the variables. C. A downward-moving line from left to right shows that no relationship exists between the variables. D. A reliably fit line can be extended indefinitely to predict future outcomes of the same variable activity.

A. Confidence intervals can be plotted to display the strength of the relationship given the existing noise.

An insurer's records generated and reported under a transaction-based statistical plan A. Contain the dollar amount of the premium (positive or negative). B. Generally include summarization at the insurer level. C. Fulfill the regulatory need for data and nothing else. D. Are more meaningful when accounted for on a policy year basis.

A. Contain the dollar amount of the premium (positive or negative).

Which one of the following is true regarding multivariate summaries and displays? A. Contingency tables are also called categorical tables. B. Pivot tables are also known as cross tabulations. C. A null hypothesis assumes association between variables. D. Data points falling on a parabola suggest a linear correlation.

A. Contingency tables are also called categorical tables.

To eliminate duplicate data, a data scientist uses which one of the following with the SELECT statement? A. DISTINCT B. FROM C. WHERE D. COUNT

A. DISTINCT

Which one of the following best describes the relationship between data documentation and data governance? A. Data documentation supports whatever standards are desired or needed by an organization. B. Established documentation processes have little influence on data governance efforts. C. Accurate and appropriate data governs itself regardless of any data documentation policy. D. Data documentation can best be verified by reliance on employees' memories.

A. Data documentation supports whatever standards are desired or needed by an organization.

Sean is a data scientist helping to solve a risk management problem with his company. He is given the data appropriate to the project, and he will analyze it and provide the results to the manager who made the request. The data and tools needed to conduct this analysis are considered A. Data engineering and processing technology. B. Alternative business decisions. C. Data-driven decision making. D. Segmented data.

A. Data engineering and processing technology.

Which one of the following statements most aligns Tufte's fundamental principles of analytic graphics? A. Data graphics should be appropriately documented with labels, scales, and sources. B. Data graphics and analysis should be driven by the tools and available software . C. Data graphics should include text descriptions to help the reader interpret the plot. D. Data graphics can aid in making your conclusions shine with clarity even when they are based on poor data.

A. Data graphics should be appropriately documented with labels, scales, and sources.

A stakeholder analysis is undertaken by an insurer's data governance committee because A. Data is received on different bases and broken down by several variables. B. Various departments have similar demands for types and formats of collected data. C. Stakeholders come to a consensus on their expectations of how data should be handled. D. Mergers with legacy systems are considered essential to users of insurance data.

A. Data is received on different bases and broken down by several variables. Stakeholder analysis is the process of assessing a system and potential changes to it as they relate to relevant and interested parties. This information is used to assess how the interests of those stakeholders should be addressed in a project plan, policy, program, or other action.

The important first step in the data analytics decision-making model is to A. Define the problem. B. Discover trends and relationships. C. Prepare the data. D. Make the data-driven decision.

A. Define the problem.

Which one of the following categories most accurately describes the type of data provided by the United States Census Bureau? A. Demographic data B. Customer financial data C. Building data D. Behavioral data

A. Demographic data

Which one of the following statements is true for the use of pictograms? A. Displaying many different variables usually works better on a bar chart. B. Icons with intricate shapes are effective in showing comparisons. C. Having the audience count the icons adds interaction and understanding. D. Icons are helpful when they represent very large amounts of data.

A. Displaying many different variables usually works better on a bar chart.

Cathy O'Neil in her book Weapons of Math Destruction discusses how in the earlier days of insurance actuaries developed predictive models from data for large groups or pools of customers. How does the application of predictive models in insurance causes the departure from the original goals of insurance to share costs over groups or pools of customers? A. Escores, which are not transparent, based on consumer patterns, may be used to reflect whether a customer is likely to shop for a new policy and charge more based on predicted willingness to switch insurers. B. In the early 1900s, African Americans were considered to be uninsurable for life insurance, due to the high mortality rates of some subgroups of African Americans, when actual analysis of data shows that many African Americans had long life expectancies. C. Individuals are protected when misfortune strikes, due to pooling effect of insurance. D. A drunk driving record is given too much weight in the insurance price.

A. Escores, which are not transparent, based on consumer patterns, may be used to reflect whether a customer is likely to shop for a new policy and charge more based on predicted willingness to switch insurers.

Which one of the following items is most often true about the characteristics of exploratory graphs? A. Exploratory graphs aid in identifying and prioritizing tasks for follow up. B. Details in exploratory graphs are immediately cleaned up and prettified. C. Creating compelling exploratory graphs requires a great deal of time. D. Only one or two of exploratory graphs should be made in the exploratory data analysis process.

A. Exploratory graphs aid in identifying and prioritizing tasks for follow up.

Which one of the following statements includes all rows from two tables being joined, in which common rows between the two tables are matched together and rows found on only one table include NULL values for the data fields from the other table? A. FULL JOIN B. RIGHT JOIN C. EQUAL JOIN D. LEFT JOIN

A. FULL JOIN

Statistical agents A. Gather specific information while helping insurers keep relevant data. B. Develop statistical plans that remain unchanged over time. C. Collect historical insurer experience only by state and class. D. Are unable to focus on specialized lines of business, such as crop-hail.

A. Gather specific information while helping insurers keep relevant data.

At what stage of the ggplot process is data plotted? A. Geom B. Mapping C. Labels & guides D. Coordinates & scales

A. Geom

Classification plans help ensure rate equity by A. Grouping policyholders with similar loss potential. B. Counting claims. C. Making insurance adjustments. D. Dividing claims by exposure.

A. Grouping policyholders with similar loss potential.

Gustav is in charge of maintaining a database at his company and wants to make sure he understands how to safeguard his customers' social security numbers (SSNs). Which one of the following is a procedure that Gustav can appropriately follow based on general state requirements? A. Gustav cannot publicly display SSNs. B. Gustav cannot under any circumstances send documents through the mail containing SSNs. C. Gustav cannot ask for SSNs to be provided over the internet even with encryption. D. Gustav can require SSNs as an identifier for website access without requiring a password.

A. Gustav cannot publicly display SSNs.

Which one of the following is correct regarding HTML? A. HTML has a predefined set of markup tags. B. HTML does not use markup tags. C. HTML uses only one type of markup tag. D. HTML allows a data scientist to create markup tags.

A. HTML has a predefined set of markup tags.

The Review Process includes which one of the following activities? A. Identify misleading steps. B. Select best model. C. Determine deployment strategy. D. Rank results with respect to business success criteria.

A. Identify misleading steps.

Which one of the following is true regarding data quality and an insurer's financial results? A. Improving data quality could free up actuarial resources for more value-producing assignments. B. Data errors exist but rarely reach the point of directly affecting an insurer's financial statement. C. Actuaries report that even though they spend over half their time on data quality issues, errors still cause financial problems. D. More than half of projects undertaken by actuaries are adversely affected by data quality issues.

A. Improving data quality could free up actuarial resources for more value-producing assignments.

Normalized data results A. In variables that lie in the range zero to one. B. When numeric representations are converted to qualitative data. C. When extreme values are coded to the highest reasonable value. D. In variables with more extreme proportions with each other.

A. In variables that lie in the range zero to one.

Annamarie is a data scientist working for a reinsurance broker, collecting and analyzing data on the broker's customers, who are both primary insurers and reinsurers, not individual policyholders. She needs domain knowledge in A. Insurance and reinsurance. B. Computer science. C. Mathematics. D. Customer service.

A. Insurance and reinsurance. Annamarie needs domain knowledge in insurance and reinsurance, which provides the context of her work as a data scientist. Her knowledge of mathematics and computer science help her as a data scientist, but in her position at a reinsurance broker, the required domain knowledge is that of insurance and reinsurance. Customer service would be more beneficial if she were interfacing with or gathering data on individual policyholders.

Edward Tufte is known for A. Introducing the idea of "chartjunk," which distracts from and confuses a graphic's intended meaning. B. Avoiding chartjunk to create minimalist displays that improve data presentation and understanding. C. Stressing the importance of display elements rather than the presentation's data. D. Espousing better results from a large share of ink being devoted to gridlines and tick marks.

A. Introducing the idea of "chartjunk," which distracts from and confuses a graphic's intended meaning.

A series of videos showing air traffic over Britain has been identified as one of the best data visualizations in recent years. Despite this recognition, several flaws have been noted. Which one of the following statements best describes a criticism of this visualization? A. It is not successful in achieving statistical goals. B. It is not valuable as an eye-catching data display. C. It is not effective in telling a story to the audience. D. It is not useful in communicating information.

A. It is not successful in achieving statistical goals.

After an outlier is detected, A. It should be removed if the outlier was found to be incorrectly entered. B. It can be removed by following objective and consistent standards. C. It should be further assessed if it is proven to be illegitimate. D. It should not remain if the analyst uses intuition to make the decision.

A. It should be removed if the outlier was found to be incorrectly entered.

A data scientist wanting to create a single table that will contain certain information from two separate tables will need to particularly use which one of the following statements? A. JOIN B. SUM C. AVG D. SELECT

A. JOIN

A display with a logarithmic scale for the x-axis and y-axis is a A. Log-log graph. B. Hierarchical table. C. Semi-log graph. D. Tabular display.

A. Log-log graph.

For which one of the following are tables generally preferred over graphs for information display? A. Looking up individual values. B. Displaying relationships among and between sets of quantitative values. C. Detecting patterns within data. D. Detecting trends within the data.

A. Looking up individual values.

Which one of the following statements is most accurate? A. Loss development factors are ordinarily used to forecast liability losses. B. Loss development factors are never used to forecast liability losses. C. Loss development factors do not incorporate incurred or reported losses. D. Loss development factors are not based on historical patterns.

A. Loss development factors are ordinarily used to forecast liability losses.

Which statement about geoms in ggplot is true? A. Mapping colors can be continuous or categorical. B. The default geom smoothing method is loess. C. An object with an alpha of 1 will be completely transparent. D. You must add points to your plot before adding a smoothed line.

A. Mapping colors can be continuous or categorical.

Insurance regulators are conducting a periodic examination of Durham Insurance Company. They review Durham's accuracy in rating its policies as reflected in the relevant insurance department's approved guidelines and are satisfied with the results. However, they note several instances of slow and potentially deceptive claims handling and claim settlement practices that will require further review. Through this examination, regulators are fulfilling their duty to monitor and regulate Durham's A. Market conduct. B. Financial solidity. C. Financial solvency. D. Rates.

A. Market conduct.

One of the points in the five-point summary found in a boxplot is the A. Median. B. Mean. C. Mode. D. Moment.

A. Median.

Which one of the following is the data about data that helps determine whether particular data is suitable for a specific purpose and ensures that data is used appropriately? A. Metadata B. Histograms C. Variables D. Tables

A. Metadata

A statistical agent in the United States for major property-casualty lines of insurance (other than workers compensation and health) is the A. National Independent Statistical Services. B. Surety and Fidelity Association of America. C. National Council on Compensation Insurance. D. General Insurance Statistical Agency.

A. National Independent Statistical Services.

Which one of these categories best describes the kind of data provided by the American Community Survey? A. Occupational data B. Behavioral data C. Business financial data D. Motor vehicle data

A. Occupational data

A data scientist has received text data from emails for analysis. Which one of the following is the first step the data scientist should take to prepare the data? A. Parsing and converting the data to a structured format B. Reading all the emails C. Submitting a query to obtain the data D. Designing the analysis

A. Parsing and converting the data to a structured format

Deductible as a data element would be collected as statistical plan data for A. Personal auto policy physical damage coverage. B. Personal auto policy no-fault coverage. C. Homeowners liability coverage. D. Personal auto policy liability coverage.

A. Personal auto policy physical damage coverage.

Nodes in a social network represent a A. Point. B. Directed tie. C. Link. D. Relationship.

A. Point.

Which one of the following is a level of the CRISP-DM hierarchical process model? A. Process Instances B. Assess Situation C. Tool and Technique D. Data understanding

A. Process Instances

The General Insurance Research Organization (GIRO) Data Quality Working Party A. Reported that data quality issues affect insurers' performance. B. Studied data quality issues only as they affect actuaries. C. Found little connection between data quality and profitability. D. Noted that data quality did not affect the reliability of financial statements.

A. Reported that data quality issues affect insurers' performance.

Which one of the following statements specifies the order in which requested columns will appear? A. SELECT B. WHERE C. AND D. OR

A. SELECT

The shape of the represented data shows trends or comparisons in which one of the following displays? A. Semi-log graph B. Hierarchical table C. Side-by-side array D. Social network matrix

A. Semi-log graph

Company A has hired a data scientist to develop an effective data display for its experienced internal analysts to monitor company performance. Which one of the following responses best describes the required visualization? A. Simple graphs that focus attention on the data B. Visualizations that encourage involvement C. Attractive graphics to enhance understanding D. Novel data displays to elicit a reaction

A. Simple graphs that focus attention on the data

In most claims databases, negative amounts refer to A. Subrogation recoveries. B. Allocated loss adjustment expenses (ALAE). C. Settlement amounts. D. Unallocated loss adjustment expenses (ULAE).

A. Subrogation recoveries.

Which one of the following options is true about graphics devices in R? A. The currently active graphics device can be found by calling dev.cur(). B. The active graphics device is assigned the number 2. C. Plotting can be performed in multiple graphics devices in a single plot call. D. Only one graphics device can be open at a time.

A. The currently active graphics device can be found by calling dev.cur().

A data scientist is selecting a delimiter for a delimited text file. What one of the following characteristics should the delimiter have? A. The delimiter should be a character not used in any data field. B. The delimiter should be a character used in a unique data field. C. The delimiter should always be a comma. D. The delimiter should always be a blank space.

A. The delimiter should be a character not used in any data field.

Robust estimation techniques are valuable for visualizing non-normal data. To assess whether the residuals for different groups of a non-normal data set may be pooled, distributions of the spread-standardized residuals may be graphed by normal q-q plots. Which one of the following descriptions correctly defines the spread-standardized residual? A. The difference between a transformed observation and its group median, divided by its group mean absolute deviation B. The difference between a transformed observation and its group mean, divided by its group standard deviation C. The difference between a transformed observation and its group median, divided by its group standard deviation D. The difference between a transformed observation and its group median

A. The difference between a transformed observation and its group median, divided by its group mean absolute deviation

A data display combining the benefits of both Infovis and statistical graphs would be best described by which one of the following statements? A. The display incorporates visual appeal and engagement with the clarity of the best statistical data visualizations. B. The display balances the mutually exclusive aspects of discovery goals and communication goals. C. The display features an eye-catching design with a focus on technically impressive visualization, rather than practical purpose. D. The display is tailored to readers with extensive background knowledge and features standard graphic forms.

A. The display incorporates visual appeal and engagement with the clarity of the best statistical data visualizations.

By including the names of two tables in the FROM statement in a query using Structured Query Language (SQL), a data scientist can expect that A. The first row in one table will be combined in the same row with the first row in the second table. B. The query is likely to produce less rows of data than required. C. Only identified and selected rows in each table will be combined. D. The data will likely be matched correctly.

A. The first row in one table will be combined in the same row with the first row in the second table.

Angelina is graphing data points for two variables in a scatterplot and showing Julian, a new analyst working with her, the general guidelines for creating and using these displays. Which one of the following is the most appropriate guidance she can provide? A. The gradient of the plots will be either positive or negative, reflecting the data's positive or negative correlation. B. A scatterplot usually plots the independent variable on the y-axis and the dependent variable on the x-axis. C. A zero correlation as to the strength of the scatterplot would represent data points tightly clustered around a line or other identified shape. D. The form of a scatterplot reveals whether the points are associated with one another, which is positive or negative.

A. The gradient of the plots will be either positive or negative, reflecting the data's positive or negative correlation. Direction = Gradient

When transforming data to approximate normality, A. The logarithmic transformation is commonly applied to positively skewed variables. B. The analyst may raise a variable to the power of the square root, meaning raising the variable to the second power. C. The analyst may square a variable, meaning raising the variable to the 1/2 power. D. The resulting graph appears less like a bell shape and more positively skewed.

A. The logarithmic transformation is commonly applied to positively skewed variables.

Graphs and charts are the best display to use when A. The message of the data is found in its trends or patterns. B. Precision of values matters. C. Multiple units of measure are involved in the analysis. D. Comparison of individual values is important.

A. The message of the data is found in its trends or patterns.

Which one of the following is correct regarding structure in XML? A. The self-described structure of XML documents makes it simpler to transfer data from one system to another. B. The structure of XML documents prevents them from being transferred from one system to another. C. There is very little structure in XML documents. D. The structure of XML documents makes it more difficult to transfer data from one system to another.

A. The self-described structure of XML documents makes it simpler to transfer data from one system to another.

Joel is a data scientist learning how to query data from a database using Structured Query Language (SQL). He is studying the statements in his manager's query. Which one of the following best represents what he would have learned? A. The statements are not case-sensitive. B. Group functions are the same as aggregate functions. C. To obtain data from an SQL database, start with the FROM statement. D. Use the symbol "=" to end the query.

A. The statements are not case-sensitive.

Which one of the following is mandated for insurers? A. To collect and file statistical plan data B. To file statistical data with a statistical agent C. To file statistical data directly with state insurance regulators D. To file all operational data as the basis for statistical plan data

A. To collect and file statistical plan data Insurers are mandated to collect and file statistical plan data. They do not have to file with a statistical agent or file directly with state regulators. Only some operational data, as opposed to all, is the basis for statistical plan data.

Which one of the following most accurately depicts the primary purpose of advisory organizations? A. To develop loss costs B. To develop final rates C. To develop external data D. To develop underwriting guidelines

A. To develop loss costs

An alternative to trending historic values during the data preprocessing phase is A. To incorporate trends directly into the underwriting model. B. To use inflation-sensitive exposure bases. C. To adjust historic premiums to current rate levels. D. To identify data sources.

A. To incorporate trends directly into the underwriting model.

Which one of the following correctly describes the purpose of Unicode? A. To provide a representation for each letter and symbol in virtually all of the world's languages B. To replace the standard of the International Organization for Standardization C. To provide a common programming language to be used in all computer systems D. To use a Latin word to represent all of the words in the world's languages

A. To provide a representation for each letter and symbol in virtually all of the world's languages

Which one of the following best describes the purpose of quantitative displays? A. To provide quantitative information about relationships. B. To make presentations professional. C. To present visual content that is not information. D. To make numbers interesting.

A. To provide quantitative information about relationships.

To import an HTML file from the internet, a data scientist A. Uses an element to indicate the path to the file. B. Uses an API to transfer the data. C. Uses a form to provide structure for the data. D. Must request the data in the HTML language only.

iCAS_DS1

Kaugnay na mga set ng pag-aaral

Chapter 11

econ ch.3

Chapter 2 and 4 (Week 1)

Apprentice Lineman General Knowledge

Excel Vocabulary

chapter 4

National Counseling Examination Study Guide

Chapter 4 SmartBook

DEN 110 Cranial nerves part 2

Chapter 21 Quiz: Spring 2021 BIO-215-OL-A: Nutrition

Chapter 7 Operations

Lesson 6 quiz

Chapter 9 Construction Regulation

ACCT 5312 Final Exam (Chapters 9-11)

Accounting Connect Chapter 9

MUSC 251: Study set for Quiz 3

Psyc 100 - Chapter 13

Chapter 12: MRP and ERP

Promulgated Contracts: Chapter 5 Closing, Possession, and More

National Exam Practice (PSI) 2