Week 1 Data exploration

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

In instances when collecting data from an entire population is challenging, data analysts may choose to use what?

A sample

Text or string data type

A sequence of characters and punctuation that contains textual information

Nominal data

A type of qualitative data that is categorized without a set order Example: First time customer, returning customer, regular customer - New job applicant, existing applicant, the internal applicant - New listing, reduced price listing, foreclosure

Discrete data

Data that is counted and has a limited number of values its when you don't accept anything other than full stars or points Example: Number of people who visit a hospital on a daily basis (10, 20, 200) - Room's maximum capacity allowed - Tickets sold in the current month

Continuous data

Data that is measured and can have almost any numeric value Example: Height of kids in third grade classes (52.5 inches, 65.7 inches) - Runtime markers in a video - Temperature

Unstructured data

Data that is not organized in any easily identifiable manner

External data

Data that lives and is generated outside of an organization

Internal data

Data that lives within a company's own systems

How data is collected

Interviews observations forms Questionaries Surveys Cookies

Which method of data-collection is most commonly used by scientists?

Observations

An entertainment website displays a star rating for a movie based on user reviews. Users can select from one to five whole stars to rate the movie. The star rating is an example of what type of data? Select all that apply.

Ordinal data Discrete data

Fill in the blank: The running time of a movie is an example of_____ data.

continuous

Logical Data Modeling

focuses on the technical details of the model such as relationships, attributes, and entities.

Fill in the blank: In data analytics, a ____ refers to all possible data values in a certain dataset.

population

Physical Data Modeling

represents an application and database-specific implementation of a logical data model

unstructured data examples

satellite images, photographic data, video data, social media data, text message, voice mail data

The Not operator for a Boolean type would be

"IF (Color="Grey") AND (Color=NOT "Pink") then buy them."

The AND operator for a Boolean type would be

"IF(Color="Grey")AND(Color="Pink)then buy them"

The OR operator for a Boolean type would be

"IF(Color="Grey")OR(Color="Pink")then buy them."

Boolean data type

A data type with only two possible values, such as TRUE or FALSE

Entity Relationship Diagram (ERD)

A diagram that depicts an entity relationship model's entities, attributes, and relations.

Data model

A model that is used for organizing data elements and how they relate to one another

Sample

A part of a population that is representative of the population

Data type

A specific kind of data attribute that tells what kind of value the data is

Unified Modeling Language (UML)

A standard format for communicating and documenting software design.

Population

All possible data values in a certain dataset

What does data transformation enable data analysts to accomplish?

Change the structure of the data

The data-collection process involves deciding what data to use, determining how much data to collect, and selecting the right data type. Which of the following are also steps in the data-collection process? Select all that apply.

Choosing data sources Determining the time

To track people's online activities and interests, which method of data collection is most effective?

Cookies

Wide data is preferred when

Creating tables and charts with a few variables about each subject Comparing straightforward line graphs

Second-party data

Data collected by a group directly from its audience and then sold

First-party data

Data collected by an individual or group using their own resources

Third-party data

Data collected from outside sources who did not collect it directly

Long data

Data in which each row is one time point per subject, so each subject will have data in multiple rows

Wide data

Data in which every data subject has a single row with multiple columns to hold the values of various attributes of the subject

Structured data

Data organized in a certain format such as rows and columns

Which of the following is an example of unstructured data

Email message

When discussing structured databases, data analysts refer to the data contained in a row as a record. How do they refer to the data contained in a column?

Field

The power of multiple conditions

For example, if you wanted to filter for shoes that were grey or pink, and waterproof, you could construct a Boolean statement such as: "IF ((Color = "Grey") OR (Color = "Pink")) AND (Waterproof="True")." Notice that you can use parentheses to group your conditions together.

Data collection considerations

How the data will be collected Choose data sources Decide what data to use How much data to collect Select the right data type Determine the time frame

What are the characteristics of unstructured data? Select all that apply.

May have an internal structure is not organized

Data types in spreadsheets

Number Text or string Boolean

Fill in the blank: Internet search engines are an everyday example of how Boolean operators are used. The Boolean operator _____ expands the number of results when used in a keyword search.

OR

Organizations such as the U.S. Centers of Disease Control (CDC) often use data collected from hospitals. What kind of data is the CDC using if it is collected by hospitals, then sold to the CDC for its own analysis?

Second-party data

Structured data enables data to be grouped together to form relations. This makes it easier for analysts to do what with the data? Select all that apply.

Store Analyze Search

Long data is preferred when

Storing a lot of variables about each subject. For example, 60 years worth of interest rates for each bank Performing advanced statistical analysis or graphing

Data Modeling

The process of creating a specific data model for a determined problem domain.

The use of external data is particularly valuable in which circumstances?

When analysis depends on as many data sources as possible

Which of the following statements accurately describes a key difference between wide and long data?

Wide data subjects can have data in multiple columns. Long data subjects can have multiple rows that hold the values of subject attributes.

Conceptual Data Modeling

a detailed model that captures the overall structure of data in an organization

Ordinal data

a type of qualitative data with a set order or scale Example: Movie ratings (number of stars: 1 star, 2 stars, 3 stars) - Ranked-choice voting selections (1st, 2nd, 3rd) - Income level (low income, middle income, high income)


Kaugnay na mga set ng pag-aaral

Systematic Reviews and Meta-Analyses

View Set

Arkansas Insurance Exam for Life & Health: Policy Provisions, Riders, & Options

View Set

Lab 6 Muscular System (Appendicular Muscles)

View Set

BUS 320 - Personal Finance Chapter 10 & 11

View Set

Chapter 16- The Age of Ecploration

View Set