Lesson 5: Data Analytics Tools and Techniques

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

What is a boxplot?

A boxplot provides a concise summary of the quartiles of numerical data (i.e., cutpoints that divide the data into 25% percentile segments). This graph is also convenient for detecting outliers and skewness.

What is Artificial Intelligence:

A computer program that can perform tasks that can usually only be performed by humans. This technically includes translation and voice recognition

Describe a decision tree:

A decision support tool that uses a tree-like model of decisions and their possible consequences, including chance outcomes, resource costs, and utility.

What is a heatmap?

A heatmap is a colorful graph that can visually show frequency or interaction using a range of colors. Typically, red is used for the most frequent/popular outcomes where blue represents the least frequent/popular outcomes.

What is a relational database?

A relational database is a collection of data items with pre-defined relationships between them. These items are organized as a set of tables with columns and rows.

What is a scatterplot?

A scatterplot is a two dimensional graph which is great to visualize correlation or relationships. Each dot on the scatterplot represents an outcome for two numerical variables of interest

What is hierarchal clustering:

An algorithm that groups similar objects into groups called clusters.

What is the classification graph technique:

Classification is a technique in which the analyst wants to assign an item to a specific category based on various conditions. Compare and locate then assign to a group. Attempts to identify an unknown object among known groups. Example: Spam detection, cancer detection, compare

Name three other data analytic applications:

D3, SPSS, ETL

What is the clustering graph technique:

The groupings are unknown and the analyst wishes to determine if the objects belong to any groups, and if so, how many groups exist. Search for apple, but must assign results into groups such a company, fruit, etc.

What is the Principal Component Analysis (PCA)?

This technique attempts to group variables into meaning groups. An analyst attempts to find out if the variables themselves group in any meaningful way. Often, PCA is used in data reduction. Collapsing data into smaller groups.

what is a time series technique?

Time series is a technique that looks for trends in data over time. It is essential in knowing whether any patterns in the data are true trends that should be recognized.

How are Python and R used?

Uses large amounts of data to perform statistical analysis.

Define Descriptive Analytics:

What has happened. Provides a holistic view of the performance and trends based on historical and benchmarking data.

Define Predictive Analytics:

What might happen. Using statics and histograms/graphs to predict outcomes. Example: using past medical records to predict future health.

Define Prescriptive Analytics:

What should happen. Using the data, predict what will happen if not change is made in the organization. Or by tweaking something, predicting the outcome.

What are neural networks:

a computer system modeled on the human brain and nervous system. It adapts to change quickly. Includes machine learning.

What is a histogram:

a diagram consisting of rectangles whose area is proportional to the frequency of a variable and whose width is equal to the class interval. Can show skewness and outliers.

What is machine learning:

is the study of computer algorithms that improve automatically through experience. It is seen as a part of artificial intelligence.

What is qualitative data:

Data that is description: observations, opinions, comments

What is quantitative data:

Data that is systemic, numerical: annual sales, review ratings

Name the three types of analytic models: (DPP)

Descriptive, Predictive, Prescriptive

Name 4 types of popular graphs:

Histogram, boxplot, heatmap, scatterplot

What is the regression graph technique:

Housing prices, sales, persons weight. This is technique that allows us to predict an outcome (either numerical or categorical) based on a set of predictor variables. One might think of this process as providing an output given a set of input variables. For example, an analyst might predict the churn of customers based upon various customer demographic data.

What is SQL and how is it used?

Structural Query Language (SQL). SQL is a language that allows an analyst to interface with relational databases.

List the 3 data file types: (SUS)

Structured, unstructured, semi-structured

What is tableau used for?

Tableau can help anyone see and understand their data via visual dashboards. An analyst can connect to almost any database, drag and drop to create visualizations, and share their creations with other stakeholders easily.

What is JSON and what data source is it used with?

JavaScript Object Notation. This type can be very beneficial when pulling data from an external source such as a web-server. Since the JSON format is text only, it can easily be sent to and from a server. Since it is somewhat structured (semi-structured) it can easily be ingested into a structured database. For example, a business might use a Twitter API to pull in Twitter data in JSON format on their company to be stored in a database for later analysis.

Define semi-structured data:

Key Characteristic: Loosely organized into categories using meta tags. Typical File Types: JSON, XML, Email, Web pages

Define structured data:

Key Characteristic: Often numbers or labels, stored in a structured framework of columns and rows relating to pre-set parameters. Typical File Types: Databases

Define unstructured data:

Key Characteristic: Text-heavy information that's not organized in a clearly defined framework or model. Typical File Types: Audio, Video, Image data, Natural Language, Documents

What is Mean, Median, Mode, and Standard Deviation:

Mean: Average, Median: middle value, Mode: the number that occurs most often, SD: the variation of the set of values (i.e. 2,4,6 SD= 2)

Describe anomaly detection methods:

Outliers. The process of identifying unexpected items or events in data sets, which differ from the norm. Anomalous data can indicate critical incidents, such as a technical glitch, or potential opportunities, for instance a change in consumer behavior. Use a scatterplot to see outliers.

Differ between scripting and programming languages:

Programming uses a compiler to convert the high level programming language into machine learning. Scripting languages use an interpreter. Java= programming, Python, R = scripting

List the 2 data types: QQ

Quantitative, Qualitative

Name 5 types of graph techniques (RCC)

Regression, classification, clustering, time series, and PCA (grouping).

List 6 data sources:

Scraping data, open source, passive collection, creating data, application programming interface, in-house data, internal , external


Ensembles d'études connexes

Drivers Ed Test 7 - Starting, Steering, Stopping

View Set

Waterfall Method vs. Iterative Model

View Set

Chapter 8, Chapter 7, Chapter 5, Chapter 6

View Set

PMBOK Ch. 2 - Organizational Influences and Project Life Cycle

View Set

Affiliate Broker TN National Exam Practice (PSI)

View Set

تاريخ الاردن توجيهي

View Set

Chapter 10: The Gastrointestinal Tract

View Set