wpc 300 final
a manager wishes to predict the annual cost (y) of an automobile based on the number of mile (x) driven. the following model was developed: y = $1500 + .6x. if a car is driven 15000 miles, the predicted cost of the car is...
$10500
what is the confidence interval when the level of significance is .06?
.940
the value of r-squared always falls between _____ and _____ inclusive
0 and 1
the WPC sports company has noted that the size of individual customer order is normally distributed with a mean of $100 and a standard deviation of $10. if a soccer team of 25 players was to make the next batch of orders, what would be the standard error of the mean?
2 (sigma/sqrt(n) = 10/sqrt(25))
the correlation coefficient between the age of an auto and the money spent to repair is .8. which of the following statements is true? -90% of the repair cost will be explained by the age of an auto -81% of the variation in the money spent on repairs is explained by the age of the auto -81% of money spent on repairs is explained by the age of an auto -64% of the variation in the money spent on repairs is explained by the age of the auto
64% of the variation in the money spent on repairs is explained by the ago of the auto (.8 squared = .64)
when you access information from two different tables connected by an identifier key, the SQL keyword you should use is...
INNER JOIN
what tool help in periodic managerial decision making?
OLAP
the SQL code to extract only departure time information for all records of the following flight table is:
SELECT departs FROM flight
a researcher wants to find out if a lack of exercise leads to weight gain. which of the following variables could not be considers as confounder? -weight -exercise level -both exercise level and weight -age
both exercise level and weight
what is the first stage of agglomerative hierarchical clustering?
by joining two clusters that are closest to each other
what are the three principles of describing numeric data?
center, speed and shape
when two variables are highly negatively correlated, the correlation coefficient will be...
close to -1
in data extraction process for an ETL, what is not an example of legit data source?
competitions' data
when sample size increases...
confidence interval decreases
which of the following is not an application of clustering analysis? -market segmentation analysis -crime prediction analysis -web click stream analysis -collaborating filtering analysis
crime prediction analysis
in an ETL process, data is loaded into a final target database such as:
data warehouse
in loading phase of an ETL tool, the transformed data gets loaded into an end target usually the...
data warehouse
what are the four types of analytical method?
descriptive, explanatory, predictive and prescriptive
_____ refers to a bias that causes an individual to value an owned object higher than its market value
endowment effect
in an agile approach of analytics, what is the last step of the process?
evaluate and improve
which of the following statements is true? -experimentation is a way of analytical thinking -analytical thinking is not based on facts -using intuition is a way of analytical thinking -heuristic thinking is slow
experimentation is a way of analytical thinking
deleting the grid lines in a chart...
increases the dat ink ratio
what is an example of primary data?
interviews
which of the following is true about multicollinearity? -regression coefficient becomes clearer and are easier to interpret -is measured using the statistical variance inflation factor (VIF) -p value reduces significantly leading to rejection of the null hypothesis -the effect of a dependent variable on another becomes difficult to isolate
is measured using the statistical variance inflation factor (VIF)
you need to find out if a customer will buy your product or not. an appropriate sample data is available from your current customer base. what analysis method will be appropriate for this study?
logistic regression
what is the true focus of information architecture?
making all information easy to find
visualization of spatial data are most illustrative when shown using...
maps
the average value of nominal data is measured by...
mean
kurtosis of a normal data distribution is a...
measure of data shape
regular consumptions of organic food will keep you in a good mood. in this example, the confounder could be...
money
which of the following assumptions is not true for simple linear regression? -relationship between dependent and independent variable should be linear -correlations between the dependent and independent variables -residuals are normally distributed -multicollinearity effect between independent variables
multicollinearity effect between independent variables
the numbers on the basketball jersey is an example of ...
nominal data
which of the following proposition describes an existing theory or belief? -alternative hypothesis -null hypothesis -proportion -standard deviation
null hypothesis
predictive analytics may be applied to _____, which is a set of techniques that use descriptive data and forecasts to identify the decisions most likely to result in the best performance
prescriptive analytics
what data analytics model use optimization techniques?
prescriptive analytics
what is an important task of a database management system?
provides support such as performing maintenance and routine backups
the unexplained variance in the regression analysis is also known as:
residual variance
in inferential statistics, a _____ is used to infer about a _____
sample, population
the central limit theorem states that if the population is normally distributed, then the...
sampling distribution of the mean will also be normal for any sample size
what is not a step used in the k means clustering algorithm?
select cluster centers in such a way that they are as closest as possible from each other
florence nightingale's rose diagram tells a story about...
soldier's cause of mortality in the hospital during the work
when you keep eating the food you don't like precisely because you already bought the food, you are committing...
sunk-cost fallacy
which of the following category of data mining you use for spam filtering of emails? -unsupervised -both supervised and unsupervised -heuristics -supervised
supervised
what is a reason not to use a table for data visualization?
tables cannot easily show trends
in the target story, why did target send the teen daughter maternity ads?
target analytics model suggested she was pregnant based on her buying habits
which of the following violates the principle of data visualization? -avoid unnecessary chart junk -the chart should tell a story -the data ink ratio should be higher than 1 -the lie factor should be closely equal to 1
the data ink ratio should be higher than 1
which of the following is true of hierarchical clustering? -the data partition does not occur in a single step -all clusters must have more than one object in it -no single cluster can have all data -all clusters must have the same number of data
the data partition does not occur in a single step
what is the definition of distance between two clusters in a single linkage clustering?
the distance between the least distant pair of objects, one from each group
what is a useful principle of data visualization?
the graph suggests a possible true effect
you arrived at a significant test statistic (p value < .05) when comparing responses from three treatment groups in a one-way ANOVA. how would you interpret the alternative hypothesis for this test?
the mean response from at least one treatment group is different from that of the others
which of the following is not a continuous random variable? -the time to complete a specific task -the outcomes of rolling two dice -the possible amount of rain on a given day -the pollution level in the air around us
the outcomes of rolling two dice
what is a type-II error?
the research hypothesis is actually true, and the test incorrectly fails to reject null hypothesis
when the lie factor of a graphical chart is more than 1...
the size of the effect shown in the graph is bigger than the actual effect in the data
what is the difference between the t-distribution and the standard normal (z) distribution?
the t-distribution has a larger variance than the standard normal distribution
a correlation coefficient between the college entrance exam grades and scholastic achievement was found to be 1.08. on the basis of this, you would tell the university that:
they should hire a new statistician (-1<r>1)
which of the following is true of A/B testing? -you should test multiple elements of you landing page at a time and compare -you need to attend WPC 300 course to learn about A/B testing -a neutral result on an A/B testing means you correctly performed the test -to increase conversion rate of your website traffic, A/B testing can be beneficial
to increase conversion rate of your website traffic, A/B testing can be beneficial
which of the following is not a true statement? -in the cluster analysis, the objects within clusters should exhibit a high amount of similarity -reducing SSE within cluster increases cohesion -the k means algorithm is a method for doing partitional clustering -to predict sales from transactional data one should perform clustering analysis
to predict sales from transactional data one should perform clustering analysis
when are asked to design a database for airline ticket reservation system, based on an entity relationship data model, what could be an example of "entity"?
traveler
in the experimental design to test the efficacy is IQ water, the IQ water is called...
treatments
in confirmatory visualization...
users expect to see a certain pattern in the data
which of the following is not a traditional data architectural process? -conceptual -physical -visual -logical
visual
which of the following is not a data visualization tool? -sisense -weka -qlik -domo
weka
what is a violation of a basic data visualization principle?
when the chart does not have graphical integrity
in order to reject the null hypothesis, the p value must be less than the...
alpha
what is not a component of relational database?
CPU of database server
which of the following is a cloud service provider? -dropbox -VMWare -gmail -icloud
VMWare
gambler's fallacy is...
a clustering illusion
a market analyst is developing a regression model to predict monthly household expenditures on groceries as a function of family size, household income and household neighborhood. the household expenditure variable in the model is...
a dependent variable
which of the following statements is true based on the following regression equation? IQ = 4 + reading label + 9.6 -reading label is not a good predictor of IQ -a unit point change in IQ will result in 5.6 point increase in reading label -a unit point change in reading lebal will increase IQ by 5.6 point -a unit point change in reading label will result in 9.6 point increase in IQ level
a unit point change in reading level will result in 9.6 point increase in IQ level
you are creating a database to store temperature and wind from various airport. what is the most likely candidate to use as the basis for a primary key in the airport table?
airport code