ITM 209 Final Exam
Which of the following are properties of primary keys?
-Each tuple must have a unique primary key -Several distinct attributes could be used together to form a primary key -It is the candidate key that is chosen as the principal means of identifying tuples within a relation
Which of the following is an example of "internet of things" (IOT) data?
-GPS Sensors -Smart Utility Meters -Fitness Sensors
All organizations need to understand and govern PII through which of the following:
-Identifying all sources of created, received, maintained or transmitted PII -Evaluating all external sources of PII -Identifying all human, natural and environmental threats to PII
Which of the following examples could cause a 'butterfly effect' (as defined in the text) in an organizations data?
-Inaccurate customer records -Incomplete purchasing history -A cascading spelling mistake
Which of the following is TRUE about a view:
-It can be used within a database to store table relationships (i.e. a query) for users to access -It conceptually contains the results of a query -If the underlying data in the tables and relations changes, so will the results of the query
What is the most secure type of authentication?
-Something the user knows, such as a user ID and password -Something the user has, such as a smart card or token -Something that is part of the user, such as a fingerprint or voice signature
Which of the following is explained as the reason that humans retain comparative advantage over artificial intelligence when addressing uncertainty and equivocality in decision making?
-Superior intuition -Imagination -Creativity
When considering the colors to use in a visualization, which of the following should be considered?
-Whether the color adds value to the visualization or is just decorative in nature -The manner in which certain color schemes may be interpreted based upon the culture(s) of the audience -The accessibility/readability by individuals with color blindness
Which of the following is a common characteristic of quality data?
-complete -accurate -unique -timely
When inner joining two tables with a many to many relationship, how many inner join clauses are needed in your query?
2
By what year does Ray Kurzweil predict that machines will be able to achieve the intelligence of human beings?
2029
Which of the following does not describe unstructured data?
A defined length, type, and format.
Which of the following describes what Trend Lines are?
A feature Tableau to show a line that represents the relationships between a set of data points that have been plotted (i.e.regression)
What is a data lake?
A storage repository that holds a vast amount of raw data in its original format until the business needs it.
Which of the following use of predictive analytics has variables that are changed due to factors outside the data-generating process and are independent of all other variables?
Active Prediction
Which of the following use of predictive analytics has variables that are changed due to factors outside the data-generating process and are independent of all other variables?
Active prediction
Angela works for an identity protection company that maintains large amounts of sensitive customer information such as usernames, passwords, personal information, and social security numbers. Angela and a coworker decide to use the sensitive information to open credit cards in a few of her customer's names. This is a classic example of which of the following security breaches?
An insider.
Which level of maturity describes companies with an analytic culture that make data driven decisions and rely upon analytics for strategic insights?
Analytic Innovators
AJ wants to send Bryan a small message securely. He wants to make sure that only Bryan can read the message, thus ensuring confidentiality. Which of the following encryption methods would he use?
Asymmetric Encryption, with the message encrypted using Bryan's public key.
Which of the following represents the three areas where technology can aid in the defense against security attacks?
Authentication and authorization, prevention and resistance, detection and response.
When using colors to visually distinguish between dimensions on a visualization in Tableau, which of the following use of color is most commonly applied?
Categorical
What is the flaw with pseudonymization?
Certain data elements can still be used in combination with additional data to identify the data subject it relates to
Which of the following did the article indicate companies are accomplishing through better use of data analytics?
Competitive advantage and innovation
The assurance that messages and information remain only to those authorized to view them.
Confidentiality
Joe is working with accounts receivable data. The database he is getting data from lists the balance in accounts receivable as $250,654. However, when he adds the accounts receivables from all the sub-accounts (i.e. customer accounts), he gets a value of $251,928. The database may be suffering from integrity issues due to which of the following quality characteristics?
Consistency
The ZN function in Tableau can be used to do which of the following?
Convert null values for a field in a data set to a value of 0
Scott has data that contains a field showing the high temperature in degrees Fahrenheit (ex. 65, 70, 73, 40, etc.) in his town by day. He wants to be able to show the temperature for each day in one of two categories: a) Normal: <80 b) Above Normal: >= 80
Create the following calculated field: IF [temperature] < 80 THEN "Normal" ELSE "Above Normal" END
Which of the following would be an example of predictive analytics?
Creating an analysis of the sales of a product for the prior year to identify what sales may be in the upcoming year.
Which of the following would be an example of predictive analysis?
Creating an analysis of the sales of a product for the prior year to identify what sales may be for the upcoming year.
If the following join statement is used to join two tables in a query, which of the following tables would all of the tuples in the relation appear in results? right outer join schema.customers on invoices.customerID = customers.customerID
Customers
Which of the following is the process of analyzing data to extract information not offered by the raw data alone?
Data Mining
Eric was asked to setup a visualization summarizing data on patients staying at the hospital based around the number of days they have been there. He has a data set that contains information on patients, which includes the date the patient was admitted (field admittedDate). Doing some research, he found that Tableau uses TODAY() to represent the current date. Which of the following calculated fields in Tableau would identify the number of days that patients have been at the hospital?
DATEDIFF('day',[admittedDate],TODAY())
Which of the following calculated fields in Tableau would identify the number of days it took to complete a service?
DATEDIFF('day',[dateIn],[dateComplete])
Collecting information from many sources and storing them together into a single location is referred to as:
Data aggregation
Which of the following is the collection of data from various sources for the purpose of data processing?
Data aggregation
Tools used to find patterns and relationships in large volumes of information that predict future behavior and guide decision making are referred to as:
Data mining tools
Which of the following describes the phenomenon where there is an incentive to record everything?
Datafication
Which of the following is a type of visualization in which you are presenting findings to an audience?
Declarative visualization
Which of the following fields in a data set would usually be found in the Dimensions area in Tableau?
Departments
Bill runs a report of all the sales for the past quarter and puts it into a visualization to show his boss the results. This is an example of what type of analysis?
Descriptive
A summary or interpretation of a data set is an example of:
Descriptive Analytics
Less than half of companies surveyed in 2016 indicated they were effective at using data to guide future strategy, which was down over the prior year.
False
Which of the following keywords when used in an SQL select statement will remove duplicate records from the results?
Distinct
Color and formatting should be used in Tableau to:
Draw attention to relevant data
During which of the following processes does information cleansing usually occur?
ETL processes
Companies use data warehouses for each of the following except:
Enter and process invoices real-time as they are received.
Which of the following is not an example of a primary enterprise system?
Enterprise revenue planning
Which of the following would be an example of causal inference?
Establishing a relationship between tariff levels and the quantity of products imported
The principles and standards that guide our behavior toward other people
Ethics
What governing body passed GDPR?
European Union (EU)
Which of the following can be described as using if-then statements to capture human knowledge?
Expert systems
A data set is a collection of organized or unorganized data.
False
Association analysis groups observations according to some measure of similarity.
False
BM's Watson can only analyze structured data.
False
Box and whisker plots are used for identifying correlation between two variables.
False
Contemporary database systems provide a three-level hierarchy for naming relations. The top level of the hierarchy consists of schemas, each of which contains catalogs.
False
Data models show the details of the physical view of information for a database.
False
Data within a view is a duplicate copy of the data that is in the underlying tables related to the view.
False
Discrete data can take on any value within a range
False
If data is changed in a table, any views that reference the table will not show the new data until the data in the view is also updated.
False
Intuitive approaches to decision making rely on depth of information, analytical approaches focus on breadth by engaging a problem with a holistic and abstract view.
False
PKE (Public Key Encryption) uses a single common key between the sender and recipient of a message to encrypt and decrypt the message
False
The additional table used to create a many to many relationship will have a primary key that only consists of a single unique attribute in all cases.
False
The difference between a direct causal relationship and an indirect causal relationship is that an direct casual relationship requires a third variable to impact a change in one variable on another variable.
False
The intersect operation does not remove duplicates, intersect all must be utilized.
False
The majority of companies that responded in the study described their companies as "open about sharing data."
False
The majority of organizations reported in the 2016 study that they are analytically challenged (i.e. they rely upon management's intuition more than data for decision making)
False
The only cause of poor quality of data is human error.
False
The problem solving ability of AI is more useful for supporting intuitive rather than analytical decision making.
False
The technique of organizing data into distinct segments that are defined before the analysis begins is referred to as cluster analysis.
False
The validation set of training data for an Advanced Neural Network is used only to test the final solution in order to confirm the actual predictive power of the network.
False
True or False: In most organizations, the managers in the operational areas (such as the manufacturing plant level) would be more interested in less granular information, whereas the executive officers of the organization would be requesting more granular information.
False
Unstructured data extracts information from data and uses it to predict future trends and identify behavioral patterns.
False
When creating relationships between tables, foreign keys are always optional. Only primary keys are needed in each table.
False
less than half of companies surveyed in 2016 indicated they were effective at using data to guide future strategy, which was down over the prior year.
False
What would be the output from a query if the following wildcard pattern were used?
Finds any cities that have "or" in any position
Amazon tracking the behavior of it's users is an example of collecting:
First-party data
What does GDPR stand for?
General Data Protection Regulations
Which of the following would violate a foreign-key constraint?
Having a value in the attribute for a foreign key that does not correspond to a value in the table which the foreign key is coming from.
Kegan is creating a visualization in Tableau that shows the average profit per unit sold by country for each of the four (4) products that his company sells. His data has the unit selling price and unit cost for each sale. These values may vary from one sale to the next depending on market price of materials relating to the cost and the unit price negotiated with the customer. The data looks like the following (assume all currency values have been converted to US Dollars already) He created the following calculated field, however he is not sure if it is giving an accurate average profit per unit when he applies it in his visualization using the AVG aggregation by country: ([Unit Price] - [Unit Cost]) / [Quantity] What advice would you give him and why?
He should be using the SUM function: SUM([Unit Price] - [Unit Cost]) / SUM([Quantity]) because some sales may contribute more or less to the average than others based upon
HIPAA is a regulation that applies to which industry?
Healthcare
Which of the following are better at making decisions when there is uncertainty?
Humans
Shell's marketing department did what to find ways to use data smarter?
Implemented mandatory data-drive communications training for all marketers
Which of the following refers to the measure of the quality of information?
Information integrity
Which of the following is decreased when using a relational database?
Information redundancy
Which of the following is NOT a component of Artificial Intelligence
Intuition Engine
Which of the following WOULD NOT be considered part of the ACCURATE characteristic of high-quality information?
Is aggregate information in agreement with detailed information?
Encryption:
Is used to scramble information into an alternative form that requires a key to read it.
Jill is creating a visualization in Tableau that is plotting points on a map. She decides to use the 'size' mark in her visualization. What does this accomplish?
It differentiates the points based upon the values of the measures used by making larger values visually bigger points.
Jill is creating a visualization in tableau that is plotting points on a map. She decides to use the 'size' mark in her visualization. What does this accomplish?
It differentiates the points based upon the values of the measures used by making larger values visually bigger points.
What is the role of a foreign key?
It is an attribute that is the primary key of one table that appears as an attribute in another table. It acts to provide a logical relationship between the two tables
What are the first two lines of defense a company should take when addressing security risks?
People first, technology second.
Which of the following describes a full outer join?
It preserves tuples in both relations.
Which of the following is an example of lag information?
KPIs (key performance indicators)
Which of the following charts are good for showing data changes over time?
Line chart
What does PII stand for?
Personally Identifiable Information
Which of the following is a technology challenges for big data?
Managing huge volumes of data, Managing streams at an extremely fast and variable pace, Managing a variety of forms and functions of data, Processing data at a huge speed
When did GDPR become effective?
May 25, 2018
Which of the following are all the same value in a normal distribution?
Mean, Median, Mode
The type of qualitative data that cannot be ranked, but can be used to count, group and take a proportion is:
Nominal
Which of the following is the first line of defense in securing information?
People
Jeff is preparing an analysis of sales year over year to determine what sales may be in the upcoming year based upon the relative seasonal sales cycle that his company experiences. This would be an example of what type of data analysis?
Predictive Analysis
Joey is creating a model based upon past stock trading information. The purpose is to indicate to management management of the best stock derivative arrangements and when to enter into them. This would be an example of why type of analysis?
Prescriptive
What uses techniques that create models indicating the best decision to make or course of action to take?
Prescriptive analytics.
Which of the following is used to uniquely identify a row (or tuple) in a table?
Primary key
The right to be left alone when you want to be.
Privacy
Nominal Data and Ordinal Data both are types of:
Qualitative Data
Nominal Data and Ordinal Data both are types:
Qualitative Data
That feature of Tableau would you utilize to label the percent of total that a slice of a pie chart makes up? (such as 5.5%)
Quick table calculation
When using diverging colors on a diagram, which of the following is one of the least desirable color schemes when considering the ability for those with color blindness to be able to effectively read / use the visualization?
Red-Green Diverging
Tree maps and heat maps use which of the following to show proportional size of values?
Size and color
Which of the following is referred to as the use of social skills to trick people into revealing access credentials or other valuable information?
Social Engineers
The pattern of reading that was originally based upon eye tracking behavior on websites but is applied to visualizations in general when determining the best layout for a dashboard is referred to as:
The F Pattern
What should be your focus when designing your visualization?
The audience
What should be your primary focus when designing your visualization?
The audience
With reference to data granularity, which of the following groups of individuals would typically want to see information at the least granular (i.e. more course) level?
The board of directors
Which of the following is a characteristic of a data lake?
The data is stored in raw form until needed for processing or analysis
Artificial Neural Networks are designed after which of the following:
The human brain
Size, color, label and detail are all examples of Tableau features that are found where?
The marks card
A null value means:
The value is unknown or does not exist
Early systems of AI used deterministic hard-coded logic. Which of the following describes why this method of creating AI became tenuous?
The worlds store of information kept growing
When Artificial Neural Networks are referred to as black boxes, which of the following is being referred to?
They provide little guidance on the intuitive logic behind their predictions
information itself has no ethics. Therefore who is responsible for developing ethical guidelines about how to manage it?
Those who own the information
When creating a histogram, what is the purpose of using the 'create bins' feature within Tableau?
To group together bands of values into buckets for measures that represent continuous data
What is the where clause in an SQL statement used for?
To select only those rows in the result relation of the from clause that satisfy a specified predicate.
Which of the following sets of data are used in machine learning to adjust the weights on the neural network?
Training Set
What chart type would be best to show the hierarchical nature of data?
Tree Map
What chart type would be the best to show the hierarchical nature of data (i.e. how sub-components build up to their parent components)?
Tree Map
Which function would be used in Tableau to show a line that represents the relationships between a set of data points that have been plotted?
Trend Lines
A person can act legally but not be acting ethically
True
A request for information from a database is called a query.
True
Which of the following applies to many to many relationships but not to one to many relationships?
You need a third table to create the relationship
A schema diagram is a pictorial depiction of the schema of a database that shows the relations in the database, their attributes, and primary keys and foreign keys.
True
Analytic Innovators are more than 60% more likely than Analytic Practitioners to use analytics for innovations that lead to new products, services, and processes to improve existing ones.
True
Analytical Innovators use data and analytics both to innovate incrementally in existing products, services and processes and to create all new products, services and business models
True
Big data is growing at an exponential rate.
True
Companies that are using analytics to automate processes in the business are gaining benefits through employees having more time to work on higher-value-added tasks.
True
Companies that are using analytics to automate processes in the business are gaining benefits through employees having more time to work on higher-value-added tasks
True
Deep learning is a subset of machine learning.
True
Dumpster diving is a method of obtaining information from users by going through discarded items (e.g. trash)
True
Human-AI symbiosis is effective because it allows for a blend of both analytic and intuitive approaches to decision making
True
In a SQL statement, union is used to join two queries together.
True
In a full outer join, all the records from the right and left tables that meet the criteria of the query will appear. This would include records from each table where there areno related records (tuples) in the other table.
True
In order to perform any actions on a database, a user (or a program such as MySQL Workbench) must first connect to a database.
True
One example of continuous data is distance.
True
One example of continuous data is height.
True
Qualitative data is categorical data.
True
Tableau allows for connections to live data in a database for purposes of having dashboards that can be refreshed periodically at a predetermined frequency.
True
Text in a novel is an example of unstructured data.
True
The select clause of the statement is used to list the attributes desired in the result of a query.
True
The use of the and logical connective is to find tuples that meet two or more criteria.
True
True or False: Organizations may have inconsistent data definitions between their production systems / databases. This may be a reason for the organization to utilize a data warehouse.
True
Using the 'as' statement in the select clause for a query will label the column or attributes header in the results with the specified text. For example: select people.personName as 'Name' would return 'Name' as the column header rather than personName.
True
With enough training through machine learning, a neural network can learn enough to begin to match the predictive accuracy of a human expert
True
Which of the following describes the veracity characteristics of big data?
Uncertainty and or untrustworthiness of data
Big data is mostly, over 90 percent:
Unstructured data
Donovan is creating a chart that utilizes a map. He wants to have the map show the borders of the different counties within each state. Where would he go to enable this on the map?
Use the Map Styles menu option
Which describes prescriptive analytics?
Uses techniques that create models indicating the best decision to make or course of action to take.
Which method of protecting data is better when considering the value of the data once personally identifiable information has been removed?
Using statistical approaches to convert original data to synthetic
Which of the following IS NOT one of the five common characteristics of quality data? (as described in the text and in class)
Valid
Which of the following describes the speed of data?
Velocity
Which of the following may be indicators of big data?
Velocity Veracity Variety Volume
Which of the following would be a reason to utilize a one-to-one (1-1) relationship?
When you have attributes about tuples (records) for which not every tuple may have information for the attribute. For example, if you were recording information about people and did not record physical characteristics such as height for every person, you may create a 1-1 relationship.
When should you use multiple colors?
When you need to differentiate types of data
Which of the following charts functions well for showing proportions (vs. quantitative data)?
Word Clouds
Which of the following charts is described in the chapter as functioning well for showing proportions (vs. quantitative data)?
Word Clouds
Which of the following would be most likely to contain the most unstructured data?
Your personal music library
Joe is doing an analysis of his investment portfolio. His data contains variables that are change due to factors outside the data- generating process and are independent of all other variables in the data. Which of the following predictive analytics uses describes the type of prediction he is doing?
active prediction
The options for order by when writing a SQL statement are:
asc, desc
If the following join statement is used to join two tables in a query, which of the following tables would all of the tuples in the relation appear in results? full outer join schema.customers on invoices.customerID = customers.customerID
both customers and invoices
The as clause:
can be used to rename attributes in the results of the query.
Which aggregation function shows the number of records that meet a set of criteria?
count
Joe is in the process of trying to eliminate all duplicate records and correct any records in the database where a relationship between the tuples in two related tables no longer exists. This would be an example of:
data scrubbing/cleansing
The purpose of integrity constraints is to:
ensure that changes made to the database do not result in a loss of data consistency.
Regression models are used to:
estimate the relationships among variables
The three factors of the variety of data are:
form, function, source
he global head of CRM (customer relationship management) was positive towards the changes required with GDPR with regard to getting customer consent to collect their data for marketing because it would:
increase data quality
Governance of the ethical and moral issues arises from the development and use of information technologies as well as the creation, collection, duplication, distribution and processing of information.
information Ethics
Considering the following, which would be the correct inner join clause to use in the query: - The two tables being joined are prescriptions and patients. A patient may have multiple prescriptions. A prescription can only relate to a single patient. - The select clause is selecting the following fields: patients.name,patients.dateOfBirth,prescriptions.rxNumber, prescriptions.medication, prescriptions.dosage - The query contained 'from pharmacy.patients' for the from clause - The primary key of patients is patients.patientID - The primary key of prescriptions is prescriptions.rxNumber
inner join pharmacy.prescriptions on prescriptions.patientID = patients.patientID
Which of the following is the human capacity to analyze alternatives with deep perception, transcending ordinary-level functioning based on simple rational thinking?
intuitive intelligence
If the following join statements is used to join two tables in a query, which of the following tables would all of the tuples in the relation appear in results. left outer join schema.customers on invoices.customerID = customers.customerID
invoices
A digital certificate:
is a data file that identifies individuals or organizations online
Which of the following is used in a SQL statement where clause to show all records where a particular attribute has null values.
is null
When creating a view, the data that is returned from querying the view:
is stored in the tables that the view queries.
The integrity constraint that requires that an attribute in a tuple not be blank (i.e. no value) is:
not null
What is needed to train an neural network?
large amounts of data
Details about the data is referred to as:
metadata
The operator like in a SQL statement is used for:
pattern matching
Which of the following is NOT a type of pattern analysis:
perfunctory analysis
Which of the following is "the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information"?
pseudonymization
Which of the following describes the basic premise of how an Artificial Neural Network works?
receive inputs, process the inputs, provide an output
The concept that a value that appears in one relation for a given set of attributes must appear for a set of attributes in another relation is:
referential integrity
Sharing information with other companies for mutual benefit is an example of:
second-party data
Which of the following SQL statements will provide all the tuples (records) and attributes from the table 'employees' which the individuals in the table are less than 40 and that make more than $150,000?
select * from hr.employees where employees.age < 40 and employees.income > 150000
Lauren is querying a data set and the results she keeps getting has a lot of duplicate rows returned. She would like to remove duplicates from the results and only display unique rows of data. What function in SQL would she use in her query?
select distinct
The three basic clauses of a SQL statement to select data are:
select, from, where
Pattern discovery is:
the process of identifying distinctive relationships between observations in a data set.
A variable in data set is considered to be exogenously altered if:
the variable changes due to factors outside the data-generating process, such as the analyst making a change to a variable to identify if there is an impact on another variable.
Purchasing data from an organization that collected it is referred to as:
third-party data
Which of the following is characterized as a lack of information about all alternatives or their consequences?
uncertainty
The integrity constraint that requires that no two tuples can have the same value for an attribute is:
unique
When is having used instead of where?
when groups are present through the use of an aggregate function (such as avg, count, etc.) and conditions need to be applied to the groups.