DSOM 3&4
The International Organization of Motor Vehicle Manufacturers (officially known as the Organisation Internationale des Constructeurs d'Automobiles, OICA) provides data on worldwide vehicle production by manufacturer. The table below shows vehicle production numbers for four different manufacturers for five recent years. Data are in millions of vehicles.
(Graph III)-5 year vehicle;Graph II- GM-GM-Toyota-Toyota-Toyota
Suppose we had a data set of from a call center where customers were asked to choose between the following three options: hear account information, billing questions, and customer service. Using the given order of the three options, and using 0-1 dummy variables to encode the categorical variables, which of the following combinations would yield an entry "customer service"?
. 001
Which statement is true of an association rule?
It is ultimately judged on how actionable it is and how well it explains the relationship between item sets.
To summarize and analyze data with both a crosstabulation and charting, Excel typically pairs
PivotCharts with PivotTables.
Which one of the following statements is not true concerning PivotTables in Excel?
PivotTables can be built using data arrayed in rows.
The charts that are helpful in making comparisons between categorical variables are
bar charts and column charts.
Complete linkage can be used to measure the distance between _________ in cluster analysis
clusters
A graphical presentation that uses vertical bars to display the magnitude of quantitative data is known as a
column chart.
The data dashboard for a marketing manager may have KPIs related to
current sales measures and sales by region.
Observation refers to the
set of recorded values of variables associated with a single entity.
A line chart that has no axes but is used to provide information on overall trends for time series data is called a
sparkline
To avoid problems in interpreting the differences in color in a heat map, ____________ can be added.
sparklines
Tables should be used instead of charts when
the values being displayed have different units or very different magnitudes.
If required, round your answers to three decimal places. Do not round intermediate calculations.
Cluster 1:3.879 Cluster 2: 3.725 Cluster 3:1.906
__________ is a measure of calculating dissimilarity between clusters by considering only the two most dissimilar observations in the two clusters.
Complete linkage
The file FDICBankFailures contains data on failures of federally insured banks between 2000 and 2012. Create a PivotTable in Excel to answer the following questions. The PivotTable should group the closing dates of the banks into yearly bins and display the counts of bank closures each year in columns of Excel. Row labels should include the bank locations and allow for grouping the locations into states or viewing by city. You should also sort the PivotTable so that the states with the greatest number of total bank failures between 2000 and 2012 appear at the top of the PivotTable. Click on the datafile logo to reference the data.
GA;4;Carson City, Las Vegas,Reno;102;Naples;4;Chart ii; Peaked;2010;Decreased
Fields may be chosen to represent all of the following except ____________ in the body of a PivotTable.
filters
Bar charts use
horizontal bars to display the magnitude of the quantitative variable
Data-ink is the ink used in a table or chart that
is necessary to convey the meaning of the data to the audience.
The best way to differentiate chart elements is using
labels.
A time series plot is also known as a
line chart.
When clustering only by dummy variables that represent categorical variables, the simplest measure of similarity between two observations is called the
matching coefficient
The endpoint of a k-means clustering algorithm occurs when
no further changes are observed in cluster structure and number