chapter 5 from the book problems (homework)

¡Supera tus tareas y exámenes ahora con Quizwiz!

delimiter

A character, or series of characters, that mark the end of one field and the beginning of the next field

ETL process

A set of procedures for blending data. The acronym stands for extract, transform, and load data.

flat file

A text file that contains data from multiple tables or sources and merges that data into a single row

A company runs many social media campaigns to increase sales. The company collects data about the amount spent on each add campaign, the number of people who click on each add, whether each person clicking on an add completed a purchase, and the location (city and country) of each person who clicked on an ad.

All of these types of data are structured data because these are highly organized data that fit into fixed fields. For example, amount of money spent is a financial number that can round to two decimal places, number of people is an integer value, whether a person completes a purchase is an integer, and the location is a defined text value.

A company scrapes data from a review website where customers can write-in about products they have purchased. The company analyzes each of the reviews but only records the number of words in the review, a rating of the tone of the review (scores from -3 to +3), and the number of stars given (1 to 4).

Although the reviews would be a type of unstructured or semi-structured data, after the analysis, the company only stores structured data.

robotic process automation (RPA)

Computer software that can be programmed to automatically perform tasks across applications just as human workers do

metadata

Data that describes other data

unstructured data

Data that has no uniform structure

structured data

Data that is highly organized and fits into fixed fields

diagnostic analytics

Information that attempt to determine causal relationships, answer the question "why did this happen?"

predictive analytics

Information that results from analyses that focus on predicting the future, answer the question, "what might happen in the future?"

prescriptive analytics

Information that results from analyses to provide a recommendation of what should happen, answer the question "what should be done?"

descriptive analytics

Information that results from the examination of data to understand the past, answer the question "what happened?"

Analyze the question: What do you think Congress should do to reform personal income taxes in this country?

Open ended question calling for speculation, but certainly not actionable in the current environment. The question is also not specific. This is not relevant and thus, should not be reworded or asked in this context.

A call center records all phone calls between employees and customers. The company stores the data so that they can review it if any allegations are made of inappropriate employee behavior.

Recorded phone calls are unstructured data because they are a fixed length.

data volume

The amount of data that is created and stored by an organization

data velocity

The pace at which data is created and stored

A company owns a football stadium and takes high definition photos of all fans. The company stores these images and plans eventually to use advanced technologies to see which fans are most likely to wear the team's colors so they can market clothing to them.

The photos are unstructured data because they do not have uniform structure and cannot be easily fit into a relational database (the link to a photo can, but the information about the photo itself would be stored outside the database).

data veracity

The quality or trustworthiness of data

data visualization

The use of a graphical representation of data to convey meaning

Analyze the question: You want to take an aggressive tax position, right?

This is a leading question, which is problematic at the outset because it influences the client to want to answer in a certain way. The question could be reworded as Do you want to be aggressive or conservative in your tax compliance? This question allows for a specific, relevant answer.

A company performs performance evaluations of all its employees each quarter. The evaluations include comments made by peers of each of employee, a supervisor's write-up of performance during the quarter with a rating on a 5-point scale, and performance metrics relative to their job title (e.g., sales completed for sales people, units repaired for repair people, etc.).

This is a mix of data structures. The scaled items and performance metrics are structured data. The comments and write-up are semi-structured data.

1. A non-profit organization keeps a list of all donors who have given to their organization in the past. The organization tracks names, dates of donations, amount donated, and additional comments about the donor and their donation.

This is a mix of structured and semi-structured data. The field containing comments is semi-structured, the other items are structured data.

A company scrapes data from a review website where customers can write-in about products they have purchased. The company stores each of the written reviews.

This is likely to be semi-structured data because companies often have limited amount of space devoted to reviews (i.e., 1,000 characters). The written reviews thus have some organization, but they cannot be fully and easily analyzed.

A university tracks all of the classes that students sign up for each semester. The university records the course number, class description, and course credit hours for each student.

This is structured data because the course number, description, and credit hours are all well-defined and fit into a fixed field.

A mechanic keeps a digital catalog of all of the part numbers and part descriptions for each type of vehicle that the company services.

This is structured data the part number and description are highly organized data that fits into a fixed field.

An online retailer tracks all of the IP addresses of every web visit. The retailer monitors IP addresses to see if visits are coming from IP addresses that are known to hack company websites.

This is structured data.

analyze the question: How much money do you want to save on taxes?

This question introduces ethical problems as it makes it seem that the practitioner can make any number work for tax expense. Although the answer would be specific and measurable (e.g., $1,000,000), the answer would not be necessarily achievable as there are important laws to follow. A better question would be, to what extent are you willing to take money-saving, aggressive tax positions that will increase your risk of being audited by the IRS? Alternatively, appropriate numbers could be introduced such as are you willing to take an aggressive tax position that saves you $1,000 but would increase your risk of an IRS audit by 20%?

Analyze the question: Why do you pay taxes?

This question is not very specific and will not result in an answer that can be measured or achievable. A better question would be, do you pay taxes because you feel it is your duty or to avoid penalties? Why?

text qualifier

Two characters that indicate the beginning and ending of a field and tell the program to ignore any delimiters contained between the characters

data lake

a collection of structured, semi-structured, and unstructured data stored in a single location

analytics mindset

a way of thinking that centers on the correct use of data and analysis for decision making

bot

an autonomous computer program designed to perform a specific task

data swamps

data repositories that are not accurately documented so that the stored data cannot be properly identified and analyzed

data mart

data repositories that hold structured data for a subset of an organization

data variety

different forms data can take

dark data

information the organization has collected and stored that would be useful for analysis but is not analyzed and is this generally ignored

automation

the application of machines to automatically perform a task that was once performed by human beings

data storytelling

the process of translating often complex data analyses into more easy to understand terms to enable better decision making


Conjuntos de estudio relacionados

Adult Development and Aging- Chapter 3

View Set

W.1 - Unit 1, Part 2: (4) Robust Statistics.

View Set

Материки. Австралия.

View Set

Concepts of Professional Nursing- Critical Thinking Questions (EXAM 1)

View Set