Chapter 4: Data & Databases
Data is organized into tables (or relations). Each table has a set of fields which define the structure of the data stored in the table. A record is one instance of a set of fields in a table. To visualize this, think of the records as the rows (or tuple) of the table and the fields as the columns of the table.
What are the characteristics of a relational database?
the process of analyzing data to find previously unknown and interesting trends, patterns, and associations in order to make decisions
What is data mining?
"data about data" ex: # of records, data type of field, size of field, description of field, default value of field, rules of use
What is metadata?
database: spreadsheet: no control of redundant data, violation of data integrity, relying on human memory to store & to search needed data
What is the difference between a spreadsheet and a database? List three differences between them.
Data: the raw facts, may be devoid of contest or intent, can be quantitative or qualitative information: processed data that possess context, relevance, & purpose, typically involves the manipulation of raw data to obtain an indication of magnitude, trends, in patters in the data for a purpose knowledge: in a certain area is human beliefs or perceptions about relationships among facts or concepts relevant to that area, can be viewed as information that facilitates action
What is the difference between data, information, and knowledge?
Quantitative data is numeric, the result of a measurement, count, or some other mathematical calculation. Qualitative data is descriptive. "Ruby Red," the color of a 2013 Ford Focus, is an example of qualitative data. A number can be qualitative too: if I tell you my favorite number is 5, that is qualitative data because it is descriptive, not the result of a measurement or mathematical calculation.
What is the difference between quantitative data and qualitative data? In what situations could the number 42 be considered qualitative data?
Primarily used to develop and analyze single-user databases. These databases are not meant to be shared across a network or the Internet, but are instead installed on a particular device and work with a single user at a time.
When would using a personal database management systems (DBMS) make sense?
data is structured into tables and all tables must be related to each other through unique identifiers
Why is it important to define the data type of a field when designing a relational database?
Netflix, lists shows related to another show that I watched & enjoyed
Name a database you interact with frequently. What would some of the field names be?
reduce data redundancy & ensure data integrity
Describe what the term normalization means.
Almost all software programs require data to do anything useful. For example, if you are editing a document in a word processor such as Microsoft Word, the document you are working on is the data. The word-processing software can manipulate the data: create a new document, duplicate a document, or modify a document. Some other examples of data are: an MP3 music file, a video file, a spreadsheet, a web page, a social media post, and an e-book
Explain in your own words how the data component relates to the hardware and software components of information systems.
Supervised learning occurs when an organization has data about past activity that has occurred and wants to replicate it. For example, if they want to create a new marketing campaign for a particular product line, they may look at data from past marketing campaigns to see which of their consumers responded most favorably. Once the analysis is done, a machine learning model is created that can be used to identify these new customers. It is called "supervised" learning because we are directing (supervising) the analysis towards a result (in our example: consumers who respond favorably). Supervised learning techniques include analyses such as decision trees, neural networks, classifiers, and logistic regression. Unsupervised learning occurs when an organization has data and wants to understand the relationship(s) between different data points. For example, if a retailer wants to understand purchasing patterns of its customers, an unsupervised learning model can be developed to find out which products are most often purchased together or how to group their customers by purchase history. Is it called "unsupervised" learning because no specific outcome is expected. Unsupervised learning techniques include clustering and association rules.
In your own words, explain the difference between supervised learning and unsupervised learning. Give an example of each (not from the book)
1) It uses non-operational data. This means that the data warehouse is using a copy of data from the active databases that the company uses in its day-to-day operations, so the data warehouse must pull data from the existing databases on a regular, scheduled basis. 2) The data is time-variant. This means that whenever data is loaded into the data warehouse, it receives a time stamp, which allows for comparisons between different time periods. 3) The data is standardized. Because the data in a data warehouse usually comes from several different sources, it is possible that the data does not use the same definitions or units. For example, each database uses its own format for dates (e.g., mm/dd/yy, or dd/mm/yy, or yy/mm/dd, etc.). In order for the data warehouse to match up dates, a standard date format would have to be agreed upon and all data loaded into the data warehouse would have to be converted to use this standard format. This process is called extraction-transformation-load (ETL).
Name three advantages of using a data warehouse.