Data Analytics Journey

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

A software intermediary that allows two applications to talk to each other. IN other words, it is the messenger that delivers your request to the provider that you are requesting it from and then delivers the response back to you (e.g., pay with PayPal)

API

It involves being able to listen to others with understanding and empathy.

Active Listening

Is the identification of rare items, events or observations in a dataset which differ from the norm or raise suspicions. It can be used to detect fraud, intrusion, outliers, technical glitch., etc. in a dataset.

Anomaly Detection

API stands for

Application Programming Interface

The development of smart machines capable of performing tasks that typically require human intelligence.

Artificial Intelligence

Is the probability of observing various data, given the hypothesis, and the observed data. It gives you the after-the-data probability of a hypothesis as a function of the likelihood of the data; the probability of getting the data you found.

Baye's Theorum

Provides a concise summary of quartiles of numerical data (i.e., cut points that divide the data into 25% percentile segments). This graph is also convenient for detecting outliers and skewness.

Boxplot

An analyst defines the major questions of interest that need to be answered, determines the needs of the stakeholders, and assess the resource constraints of the project. Define project outcomes. Which phase of the data analytics life cycle is this?

Business Understanding/Discovery Phase

Scope Statement, Stakeholder Register, Gannt Chart, and Network Diagram are all tools used in which phase of the data analytics life cycle?

Business Understanding/Discovery Phase

A technique in which the analyst wants to assign an item to a specific category based on various conditions.

Classification

Lack of ______________ on stakeholders, timeline, limitations, and budget could potentially derail an analysis.

Clear focus

Groups are unknown and the analyst wishes to determine if the object belongs to any group. An example is when data on search queries are analyzed to determine if they group a particular way and how many groups exist. Examples include: genome patterns, google news, and point cloud processing.

Clustering

Means creating meaningful dialog together that focuses son the problem, opportunity, and solution. They can use diagrams, chart, and visuals. Its strategy aims at bringing together different groups of people and third parties to assist with a project or product development (example tools could be Google Docs, Slack, Microsoft Teams, etc).

Co-creation

___________ is the context of data framework referring to not being ethical or compromising analysis to allow it to lean towards favorable results.

Conflict of Interest

Delay on the ______ activities could delay the project.

Critical Path

The longest path of activities on a project or the minimum time necessary to complete all project works.

Critical Path

Is a JavaScript library for manipulating documents based on data. Helps bring data to life using HTML, SVG and CSS.

D3.js Data Driven Document

Collecting data phase. Data is collected and stored, for easy retrieval from a database, perhaps a component of a data warehouse, by using a language like SQL. Can use webs scraping and surveys to acquire data. Which phase of the data analytics life cycle is this?

Data Acquisition

SQL, Web Scrapping Software, Surveys, input Data (Self-Generated Data), NoSQL (used to collect unstructured data) are all tools used in which phase of the data analytics life cycle?

Data Acquisition Phase

The role in the workplace in a data analytic project that obtains and cleans data, displays data in reports, and searches for trends and outliers.

Data Analyst

Also known as data cleansing, data wrangling, data urging, and feature engineering. Analyst will use SQL, Python, and R, or Excel to perform data modifications and transformations. Which phase of the data analytics life cycle is this?

Data Cleaning

Python, R, SQL, Excel are all tools used in which phase of the data analytics life cycle?

Data Cleaning Phase

Analyst begins to understand the basic nature of data, the relationships within it (between data variables), and the structure of the dataset, the presence of outliers, and the distribution of data values. This phase uses data visualization tools and numerical summaries such as measures of central tendency and variability. Which phase of the data analytics life cycle is this?

Data Exploration

Distributions (normal or skewed curve), Visualization Tools (tableau, R, Python, RStudio, and Histogram) and statistical tools (such as mean, median, and mode) are all tools used in which phase of the data analytics life cycle?

Data Exploration

Looks for patterns in large sets of data. Tools are Python and R. Also called machine learning. A specialized segment of data mining techniques that continually update to improve modeling over time. Which phase of the data analytics life cycle is this?

Data Mining Phase

Simply reducing the amount or volume of data in each storage or database. One f the goals is to optimize storage capacity.

Data Reduction

Dashboards, Tableaux, Story Telling (a feature of Tableau), graphs, charts, imagines, histogram, etc. are all tools used in which phase of the data analytics life cycle?

Data Reporting

Analyst tells the story of the data and uses graphs or interactive dashboards to inform others of findings from analyses. Tools such as Tableau is used to spot trends and patterns. Goal is to give actionable insight to stakeholders. Which phase of the data analytics life cycle is this?

Data Reporting Phase

Tree like model of alternative decisions and their consequences. It is a whole series, a sequence of binary decisions based on your data., that can combine to predict an outcome. It branches out from one decision to the next.

Decision trees.

Breaking trend over time into components; its procedures are used in time series to describe the reasons for variations in trend.

Decomposition

A type of neural network capable of performing text classification. Also, a type of recurrent neural network (RNN) that works best on sequential data.

Deep Learning

The ability for information in digital format to be accessible to the average end-user. One of the goals is to allow non-specialists to be able to access data without technical requirement. It means that everyone should have access to the data and there isn't a gatekeeper that can create a bottleneck to the data.

Democratization

The interpretation of historical data to better explain market developments. Which type of analytics is this?

Descriptive

Which type of analytics asks the question, "What happened?"

Descriptive

It enables the extraction of value from data by posing the right questions and conducting in-depth investigations into the problems. Which type of analytics is this?

Diagnostic

Which type of analytics asks the question, "Why did it happen?"

Diagnostic

Reduces the number of variables and the amount of data. You will deal with a single score and not multiple scores or a lot of data. It uses techniques such s Principle Component Analysis (PCA), Factor Analysis, and Feature selection.

Dimensionality reduction

Is a type of data integration that is used to blend data from several sources. It's often used to build a data warehouse.

ETL

Another version ETL; tends to load anything and everything into a warehouse or a data lake from where it can be analyzed at a later point of time.

ETLTL

Persuasion, verbal communication, non-verbal communication, active listening, problem-solving, and decision-making are all examples of:

Effective interpersonal communication skills.

XML stands for

Extensible Markup Language

ETLTL stands for

Extract, Transform, Load, Transform, and Load.

ETL stands for

Extract, Transform, and Load.

How does one define research questions within an organization?

Formulate questions that align with the organizational needs.

A colorful graph that can visually show frequency or interaction using a range of colors (red used for most frequency, blue is used for least frequency)

Heatmap

Algorithm that groups similar objects into groups that are called clusters.

Hierarchal Clustering.

A simple and commonly used plot to quickly check the distribution of a sample set. The data is divided into a pre-specified number of groups called bins. The data is then sorted into each bin and the count of the number of observations in each bin is retained. It helps show outliers in data and skewness.

Histogram

The ___________ shows in a graphical form the project constraints of Time, Cost, and Scope. Quality is a central theme, which is at the midpoint. If you make a change to one constraint, the other two need to be adjusted accordingly otherwise quality will suffer.

Iron Triangle

A lightweight format for storing and transporting data on networks. Also an open standard file format, and data interchange format, that uses human-readable text to store and transmit data objects consisting of attribute-value pairs and array data types.

JSON

JSON stands for

JavaScript Object Notation

Is an array of services that provide machine learning tools as part of cloud computing services. Helps clients benefit from machine learning without the cognate cost, time, and risk of establishing an in-house internal machine learning team.

MLaaS (Machine Learning as a Service)

Involves using algorithms and statistical models to analyze and draw inferences from patterns in data. Focuses on the development of computer programs that can access data and use it to learn from themselves.

Machine Learning

MLaaS stands for

Machine Learning as a Service

Founded by Thomas Bayes is an algorithm that applies Baye's theorem to estimate the conditional probability of an outcome.

Naive Bayes

Algorithm that mimic the operations of the human brain to recognize relationships between vast amounts of data. It is modeled roughly after the neurons that are inside the biological brain. They are on and off switches that relate to each. other. Taking very basic pieces of information and connecting it with many other nodes and it is very high-level cognitive decisions and classifications.

Neural Networks

A symmetrical curve centered around the mean. Its data falls to the empirical rule that indicates that a percentage of the data set falls within (plus or minus) 1, 2, and 3 standard deviations of the mean.

Normal Distribution (bell-shaped)

Finding the best value for one or more target variables given certain constraints. It shows what value a variable should have, given certain conditions or restraints.

Optimization Analysis

Organizations responsible for carrying out specific project activities in a manner and scope indicated in an application form.

Partners

It uses data, statistical algorithms, and machine learning techniques to determine the JS of potential outcomes. The aim is to have the best assessment of what will happen in the future, rather than simply understanding what has happened. Which type of analytics is this?

Predictive

Which type of analytics asks the question, "What will happen?"

Predictive

Python and R are solely tools used in which 2 phases of the data analytics life cycle?

Predictive Modeling and Data Mining

It helps organizations make decisions. Which type of analytics is this?

Prescriptive

Predictive analytics uses collected data to come up with the future outcomes, and _______________ analytics takes that data and makes decisions that cause future outcomes.

Prescriptive

Predictive and ______________ analytics are two forward-looking tools used by business leaders.

Prescriptive

Which type of analytics asks the question, "How can we make it happen?"

Prescriptive

The role in the workplace in a data analytic project that coordinates and manages the triple constraints, and gets the data/reports out to the organization.

Project Manager

The role in the workplace in a data analytic project that provides direction

Project Manager

The role in the workplace in a data analytic project that provides funds:

Project Sponsor

What are the implications of undefined outcomes of potential data analytics projects?

Project will not. be aligned with organization needs.

A production-ready language with capacity to be a single tool that integrates with every part of your workflow.

Python

Any piece of functionality is always written the same way with:

Python

Coding. and debugging is easy because of the simple syntax.

Python

Easier for people with software engineering background.

Python

Open-source, general-purpose programming language. It provides a more general approach and has several libraries that are useful to data science. Used by engineers and programmers.

Python

The indentation of code affects its meaning.

Python

Used by programmers that want to delve into the data analysis or apply. statistical techniques, and by developers and programmers that turn to data science.

Python

End-to-end platform which include data integration.

Qlik

Known as nominal or ordinal. Describes the basic features of the data in a study.

Qualitative

Known as numerical, parametric, or interval data.

Quantitative

Easier for people with no coding experience.

R

Open-source programming languages with new libraries or tools added continuously. Is mainly used for statistical analysis. Used by statisticians, educational researchers, etc.

R

Statistical models can be written with only a few lines.

R

The indentation of code does not affect its meaning.

R

The same piece of functionality can be written in several ways.

R

Used by statisticians, engineers, and scientists without computer programming skills. It's popular in academia, finance, pharmaceuticals, media, and marketing.

R

Used primarily in academics and research and is great for exploratory data analysis. In recent years, enterprise usage has rapidly expanded.

R

Is a technique that allows us to predict an outcome based on a set of predictor variables. It is like providing output given a set of inputs.

Regression

A collection of data items with predefined relationships between them (e.g., collection of tables).

Relational database

The role in the workplace in a data analytic project that pushes the team to ask interesting questions and identifies key problems.

Researcher

A domain specific language used in programming and designed for managing data in relational database management systems. Helps pull data from databases.

SQL

A two dimensional graph which is great to visualize correlation or relationships. Each dot on the graph represents an outcome for two numerical variables of interest.

Scatterplot.

Program that searches for and identifies items in a database that correspond to keywords or character specified by the user.

Search Engine

Loosely organized data in categories using tags (emails, CSV, XML, JSON doc, etc).

Semi-structured

A graph that shows frequencies related to the auto covariance time domain.

Spectral density

People who have an interest/power in any decision or activity of the project/organization. They could be involved in project plan development, change control boards, requirements gathering, risk management, and/or advocacy.

Stakeholders

By skipping the data exploration phase, the analyst will lack insight into the:

Structure of the data set

Type of data that is numbered and labeled; stored in an organized framework with columns and rows (e.g. in relational databases)?

Structured

Machine Learning algorithm that. learns on a labeled dataset, providing an answer key that the algorithm can use to evaluate its accuracy on training data (e.g., classification and regression).

Supervised Model

Is visual analytics engine that makes it easier to create interactive visual analytics in the form of dashboards.

Tableau

_______ data set is used to validate the model built.

Test (or validation)

Allows the analyst to move beyond describing the data to creating models that enable predictions of outcomes of interest. Python and R are used in automating the training and use of models. Which phase of the data analytics life cycle is this?

The Predictive Modeling Phase

The role in the workplace in a data analytic project that is the ninja that knows everything.

The Unicorn

A statistical tool that deals with a sequence of data in chronological order. A technique that looks for trends in data over time. It also involves separating data into an overall trend.

Time Series

Data set is implemented to build up a model. Data points in the __________ set are excluded from the test (validation).

Training

A regression analysis and a function of time in a value. Understanding how and why things have changed over time. (e.g., stock prices). Involves figuring out the path your data is on.

Trend Analysis

The five attributes used to determine quality in data are:

Uniqueness, relevance, reliability, validity, and accuracy.

Text heavy information that isn't organized in clearly defined framework (texts, videos, audios, etc).

Unstructured

Provides unlabeled data that the algorithm tries to make sense of by extracting features and patterns on its own. Example: clustering, anomaly detection, neural network.

Unsupervised Model

Outliers not dealt with can cause problems with statistical models due to excessive ___________.

Variability

What is the most effective way of virtual communication?

Video Conferencing

A set of codes, or tags, that describes the text in a digital document.

XML

Co-creation is

collaboration.

Programming languages are

compiled

Programming uses a __________ to convert the language to machine language.

compiler

Scripting languages are:

interpreted

Scripting uses an ______________ (like PowerShell) to cover the language to machine language.

interpreter

What decisions are necessary to initiate a data analytics project?

knowing the goals of the organization, resource availability, stakeholders, and the outcome of the project.

The general approach that the classification model uses is to

locate, compare, assign.

The _____ is the the portion of the bell curve distribution having many occurrences far from the central part of the distribution.

long tail


Ensembles d'études connexes

Chapter 5 Nutrition For Healthy Living

View Set

Bulletin 5 - Substance abuse policy

View Set

Ch. 24 Nursing Management of the Newborn at Risk: Acquired and Congenital Newborn Conditions, 23, OB COURSE POINT QUIZ 3/3, Ch20, Quiz 7, Chapter 19, Maternity Nursing, Chapter 18, Chapter 23, Chapter 24, *Chapter 21, Chapter 12, Chapter 11, Chapter...

View Set

The New Deal: First Hundred Days, Alphabet Soup Programs (New Deal and Second New Deal)

View Set

ECO285 Final Exam Practice Questions

View Set

MUSIC 118C - Final Lessons 21-39

View Set