Week 1 Data exploration
In instances when collecting data from an entire population is challenging, data analysts may choose to use what?
A sample
Text or string data type
A sequence of characters and punctuation that contains textual information
Nominal data
A type of qualitative data that is categorized without a set order Example: First time customer, returning customer, regular customer - New job applicant, existing applicant, the internal applicant - New listing, reduced price listing, foreclosure
Discrete data
Data that is counted and has a limited number of values its when you don't accept anything other than full stars or points Example: Number of people who visit a hospital on a daily basis (10, 20, 200) - Room's maximum capacity allowed - Tickets sold in the current month
Continuous data
Data that is measured and can have almost any numeric value Example: Height of kids in third grade classes (52.5 inches, 65.7 inches) - Runtime markers in a video - Temperature
Unstructured data
Data that is not organized in any easily identifiable manner
External data
Data that lives and is generated outside of an organization
Internal data
Data that lives within a company's own systems
How data is collected
Interviews observations forms Questionaries Surveys Cookies
Which method of data-collection is most commonly used by scientists?
Observations
An entertainment website displays a star rating for a movie based on user reviews. Users can select from one to five whole stars to rate the movie. The star rating is an example of what type of data? Select all that apply.
Ordinal data Discrete data
Fill in the blank: The running time of a movie is an example of_____ data.
continuous
Logical Data Modeling
focuses on the technical details of the model such as relationships, attributes, and entities.
Fill in the blank: In data analytics, a ____ refers to all possible data values in a certain dataset.
population
Physical Data Modeling
represents an application and database-specific implementation of a logical data model
unstructured data examples
satellite images, photographic data, video data, social media data, text message, voice mail data
The Not operator for a Boolean type would be
"IF (Color="Grey") AND (Color=NOT "Pink") then buy them."
The AND operator for a Boolean type would be
"IF(Color="Grey")AND(Color="Pink)then buy them"
The OR operator for a Boolean type would be
"IF(Color="Grey")OR(Color="Pink")then buy them."
Boolean data type
A data type with only two possible values, such as TRUE or FALSE
Entity Relationship Diagram (ERD)
A diagram that depicts an entity relationship model's entities, attributes, and relations.
Data model
A model that is used for organizing data elements and how they relate to one another
Sample
A part of a population that is representative of the population
Data type
A specific kind of data attribute that tells what kind of value the data is
Unified Modeling Language (UML)
A standard format for communicating and documenting software design.
Population
All possible data values in a certain dataset
What does data transformation enable data analysts to accomplish?
Change the structure of the data
The data-collection process involves deciding what data to use, determining how much data to collect, and selecting the right data type. Which of the following are also steps in the data-collection process? Select all that apply.
Choosing data sources Determining the time
To track people's online activities and interests, which method of data collection is most effective?
Cookies
Wide data is preferred when
Creating tables and charts with a few variables about each subject Comparing straightforward line graphs
Second-party data
Data collected by a group directly from its audience and then sold
First-party data
Data collected by an individual or group using their own resources
Third-party data
Data collected from outside sources who did not collect it directly
Long data
Data in which each row is one time point per subject, so each subject will have data in multiple rows
Wide data
Data in which every data subject has a single row with multiple columns to hold the values of various attributes of the subject
Structured data
Data organized in a certain format such as rows and columns
Which of the following is an example of unstructured data
Email message
When discussing structured databases, data analysts refer to the data contained in a row as a record. How do they refer to the data contained in a column?
Field
The power of multiple conditions
For example, if you wanted to filter for shoes that were grey or pink, and waterproof, you could construct a Boolean statement such as: "IF ((Color = "Grey") OR (Color = "Pink")) AND (Waterproof="True")." Notice that you can use parentheses to group your conditions together.
Data collection considerations
How the data will be collected Choose data sources Decide what data to use How much data to collect Select the right data type Determine the time frame
What are the characteristics of unstructured data? Select all that apply.
May have an internal structure is not organized
Data types in spreadsheets
Number Text or string Boolean
Fill in the blank: Internet search engines are an everyday example of how Boolean operators are used. The Boolean operator _____ expands the number of results when used in a keyword search.
OR
Organizations such as the U.S. Centers of Disease Control (CDC) often use data collected from hospitals. What kind of data is the CDC using if it is collected by hospitals, then sold to the CDC for its own analysis?
Second-party data
Structured data enables data to be grouped together to form relations. This makes it easier for analysts to do what with the data? Select all that apply.
Store Analyze Search
Long data is preferred when
Storing a lot of variables about each subject. For example, 60 years worth of interest rates for each bank Performing advanced statistical analysis or graphing
Data Modeling
The process of creating a specific data model for a determined problem domain.
The use of external data is particularly valuable in which circumstances?
When analysis depends on as many data sources as possible
Which of the following statements accurately describes a key difference between wide and long data?
Wide data subjects can have data in multiple columns. Long data subjects can have multiple rows that hold the values of subject attributes.
Conceptual Data Modeling
a detailed model that captures the overall structure of data in an organization
Ordinal data
a type of qualitative data with a set order or scale Example: Movie ratings (number of stars: 1 star, 2 stars, 3 stars) - Ranked-choice voting selections (1st, 2nd, 3rd) - Income level (low income, middle income, high income)