Data Analytics Glossary

Ace your homework & exams now with Quizwiz!

Bad data source

A data source that is not reliable, original, comprehensive, current, and cited (ROCCC)

HAVING

a SQL clause that adds a filter to a query instead of the underlying table that can only be used with aggrefate functions

CREATE TABLE

a SQL clause that adds a temporary table to a database that can be used by multiple people

SELECT INTO

a SQL clause that copies data from one table into a temporary table without adding the new tables to the database

WITH

a SQL clause that creates a temporary table that can be queried multiple times

GROUP BY

a SQL clause that groups rows that have the same values from a table into summary rows

DROP TABLE

a SQL clause that removes a temporary table from a database

ORDER BY

a SQL clause that sorts results returned in a query

LIMIT

a SQL clause that specifies the maximum number of records returned in a query

CONCAT

a SQL function that adds strings together to create new text strings that can be used as unique keys

CONVERT

a SQL function that changes the unit of measurement of a value in data

OUTER JOIN

a SQL function that combines RIGHT and LEFT JOIN to return all matching records in both tables

CAST

a SQL function that converts data from one data type to another

SUBSTR

a SQL function that extracts a substring from a string variable

JOIN

a SQL function that is used to combine rows from two or more tables based on a related column

COUNT DISTINCT

a SQL function that only returns the distinct values in a specified range

ROUND

a SQL function that returns a number rounded to a certain number of decimal places

COALESCE

a SQL function that returns non-null values in a list

INNER JOIN

a SQL function that returns records with matching values in both tables

RIGHT JOIN

a SQL function that will return all records from the right table and only the matching records from the left

LEFT JOIN

a SQL function that will return all the records from the left table and only the matching records from the right table

subquery

a SQL query that is nested inside a larger query

outer query

a SQL statement containing a subquery

CASE

a SQL statement that returns records that meet conditions by including an if/then statement in a query

inner query

a SQL subquery that is inside of another SQL statement

Function (R)

a body of reusable code for performing specific tasks in R

fill handle

a box in the lower right corner or a selected spreadsheet cell that can be dragged through neighboring cells in order to continue an instruction

calculus

a branch of mathematics that involves the study of rate of change and the changes between values that are related by a function

tableau

a business intelligence and analytics platform that helps people visualize, understand, and makes decisions with data

equation

a calculation that involves addition, subtraction, multiplication, or division

math expression

a calculation that involves, addition, subtraction, mulitplication, or division

external data

a calculation that invovles addition, subtraction, multiplication, or division (also called a math expression)

cell reference

a cell or range of cells in a worksheet typically used in forumlas and functions

delimiter

a character that indicates the beginning or end of a data item

Attribute

a characteristic or quality of data used to label a column in a table

pivot chart

a chart created form the fields in a pivot table

data frame

a collection of columns containing data, similar, to a spreadsheet or SQL table

cluster

a collection of data points on a data visualization with similar values

database

a collection of data stored in a computer system

dataset

a collection of data that can be manipulated or analyzed as one unit

Data

a collection of facts

video file

a collection of images, audio files, and other data usually encoded in a compressed format such as MP4, MV4, MOV, AVI, or FLV

portfolio

a collection of materials that can be shared with potential employers

record

a collection of related data in data table, usually synonymous with a row

range

a collection of two or more cells in a spreadsheet

array

a collection of values in spreadsheet cells

diverging color palette

a color theme that displays two ranges of data values using two different hues, with color intensity representing the magnitude of the values

case study

a common way for employers to assess job skills and gain insights into how a candidate approaches common data-related challenges

Structured Query Language (SQL)

a computer programming language used to communicate with a database

query language

a computer programming language used to communicate with a database

Log file

a computer-generated file that records evetns from operating systems and other software programs

bias

a conscious or subconscious preference in favor of or against a person, group of people, or thing

mental model

a data analyst's though process and approach to a problem

good data source

a data source that is reliable, original, comprehensive, current, and cited (ROCCC)

pivot table

a data summarization tool used to sort, regoranize, group, count, total, or average data

Boolean data

a data type with only two possible values, usually true or false

mandatory

a data value that cannot be left blank or empty

filled map

a data visualization that colors area in a map based on measurements or dimensions

combo chart

a data visualization that combines more than one visualization type

symbol map

a data visualization that displays a mark over a given longitude and latitude

bullet graph

a data visualization that displays data as a horizontal bar chart moving toward a desired value

packed bubble chart

a data visualization that displays data in clustered cirlces

bubble chart

a data visualization that displays individual data points as bubbles, comparing numeric values by their relative size

box plot

a data visualization that displays the distribution of values along an x-axis

Gantt chart

a data visualization that displays the duration of events or activities on a timeline

distribution graph

a data visualization that displays the frequency of various outcomes in a sample

static visualization

a data visualization that does not change over time unless it is edited

map

a data visualization that organizes data geographically

density map

a data visualization that represents concentrations, with color representing the number of frequency of data points in a given area on a map

scatter plot

a data visualization that represents relationships between different variables with individual data points without a connecting line

gauge chart

a data visualization that shows a single result within a progressive range of values

circle view

a data visualization that shows comparative strength in data

histogram

a data visualization that shows how often data values fall into certain ranges

heat map

a data visualization that uses color contrast to compare categories in a dataset

highlight table

a data visualization that uses conditional formatting and color on a table

area chart

a data visualization that uses individual data points for a changing variable connected by a continuous line with a filled in area underneath

column chart

a data visualization that uses individual data points for a changing variable, represented as vertical columns

line graph

a data visualization that uses one or more lines to display shifts or changes in data over time

pie chart

a data visualization that uses segments of a circle to represent the proportions of each data category compared to the whole

bar graph

a data visualization that uses size to contrast and compare two or more values

donut chart

a data visualization where segments of ring represent data values adding up to a whole

metadata repository

a database created to store metadata

normalized database

a database in which only related data is stored in each table

temporary table

a database table that is created and exists temporarily on a database server

relational database

a database that contains a series of tables that can be connected to form relationships

long data

a dataset in which each row is one time point per subject, so each subject has data in multiple rows

wide data

a dataset in which every data subject has a single row with multiple columns to hold the values of various attributes of the subject

conditional statement

a declaration that if a certain condition holds, then a certain event must take place

CSV (comma-separated values) file

a delimited text file that used a comma to separate values

spreadsheet

a digital worksheet

library

a directory containing all of a data analysts installed packages

R notebook

a document for running code and displaying the graphs and charts that visualize the code

data science

a field of study that used raw data to create new ways of modeling and understanding the unknown

changelog

a file containing a chronologically ordered list of modifications made to a project

R Markdown

a file format for making dynamic documents with R

foreign key

a filed within a database table that is a primary key in another table

data structure

a format for organizing and storing data

return on investment (ROI)

a formula that uses the metrics of investment and profit to evaluate the success of an investment

SPLIT

a function that divides text around a specified character and puts each fragment into a new, separate cell

nested function

a function that is completely contained within another function

math function

a function that is used as part of a mathematical formula

SUMPRODUCT

a function that multiplies arrays and returns the sum of those products

TRIM

a function that removes leading, trailing, and repeated spaces in data

MID

a function that returns a segment from the middle of a text string

LEFT

a function that returns a set number of characters from the left side of a text string

RIGHT

a function that returns a set number of characters from the right side of a text string

LEN

a function that returns the length of a text string by counting the number of characters it contains

python

a general-purpose programming language

chart

a graphical representation of data from a worksheet

labels and annotations (R)

a group of R functions used for customizing a plot

text string

a group of characters within a cell, most often composed of letters

vector (R)

a group of data elements of the same type stored in a one-dimensional sequence in R

Data Interoperability

a key factor leading to the successful use of open data among companies and governments

DISTINCT

a keyword that is added to a SQL SELECT statement to retrieve only non-duplicat

smoothing line (R)

a line on a data visualization that uses smoothing to represent a trend

agenda

a list of scheduled appointments

metric goal

a measurable goal set by a company and evaluated using metrics

gap analysis

a method for examining and evaluating the current state of a process in order to identify opportunities for improvement in the future

McCandless Method

a method for presenting data visualization that moves from general specific information

calculated field

a new field within a pivot table that carries out certain calculations based on the values of other fields

float

a number that contains a decimal

profit margin

a percentage that indicates how many cents of profit has been generated for each dollar of sale

code chunk

a piece of code added in an R Markdown file that is used to process, visualize, or analyze data

data element

a piece of information in a dataset

cloud

a place to keep data online, rather than a computer hard drive

data governance

a process for ensuring the formal management of a company's data assets

GAM (generalized additive model) smoothing (R)

a process for smoothing plots with a large number of points

algorithm

a process or set of rules followed for a specific task

cross-field validation

a process that ensures certain conditions for multiple dat fields are statisfied

verification

a process to confirm that a data-cleaning effort was well executed and the resulting data is accurate and reliable

hypothesis testing

a process to determine if a survey or experiment has meaningful results

Loess smoothing (R)

a process used for smoothing plots with fewer than 1000 points

smoothing (R)

a process used to make data visualizations a R clearer and more readable

design thinking

a process used to solve complex problems in a user-centric way

sponsor

a professional advocate who is committed to moving forward the career of another

data warehousing specialist

a professional who develops processes and procedures to effectively store and organize data

data engineer

a professional who transforms data into a useful format for analysis and gives it a reliable infrastructure

Swift

a programming language for macOS, iOS, watchOS, and tvOS

PHP (Hypertext Preprocessor)

a programming language for web application development

HTML5

a programming language that provides structure for web pages and connects to hosting platforms

R

a programming language used for statistical analysis, visualization, and other data analysis

CSS (Cascading Style Sheets)

a programming language used for web page design that control graphic elements and page presentation

Java

a programming language widely used to create enterprise web applications that can run on multiple clients

fairness

a quality of data analysis that does not create or reinforce bias

relevant question

a question that has significance to the problem to be solved

specific question

a question that is simple, significant, and focused on a single topic or a few closely related ideas

unfair question

a question that makes assumptions or is difficult to answer honestly

time-bound question

a question that specifies a timeframe to be studied

leading question

a question that steers people toward a certain response

measurable question

a question whose answers can be quantified and assessed

action-oriented question

a question whose answers lead to change

confidence interval

a range of values that conveys how likely a statistical estimate reflects the population

confidence level

a range of values that conveys how likely a statistical estimate reflects the population

absolute reference

a reference within a function that is locked so that rows and columns won't change if the function is copied

variable (R)

a representation of a value in R that can be stored for later use

query

a request for data or information from a database

regular expression (RegEx)

a rule that says the values in a table must match a prescribed pattern

string data type

a sequence of characters and punctuation that contains textual information

text data type

a sequence of characters and punctuation that contains textual information

Facets (R)

a series of functions that splits data into subsets in a matrix of panels

formula

a set of instructions used to perform a calculation using the data in a spreadsheet

function

a set of instructions used to perform a calculation using the data in a spreadsheet

elevator pitch

a short statement describing an idea or concept

field

a single piece of information from a row or column of a spreadsheet; in data table, typically a column in the table

metric

a single, quantifiable type of data that is used for measurement

cookie

a small file stored on a computer that contains information about its users

substring

a smaller subset of a text string

IDE (Integrated Development Environment)

a software application that brings together all the tools a data analyst may want to use in a single place

Quantitative data

a specific and objective measure, such as a number, quantity, or range

SUMIF

a spreadsheet function that adds numeric data based on one condition

SUM

a spreadsheet function that adds the value of a selected range of cells

DATEDIF

a spreadsheet function that calculates the number of days, months, or years between two dates

COUNT

a spreadsheet function that counts the number of cells in a range that meet a specific criteria

COUNTA

a spreadsheet function that counts the total number of values within a specified range

CONCATENATE

a spreadsheet function that joins together two or more text strings

AVERAGE

a spreadsheet function that returns an average of the values from a selected range

AVERAGEIF

a spreadsheet function that returns the average of all cell values from a given range that meet a specified condition

MAX

a spreadsheet function that returns the largest numeric value from a range of cells

MAXIFS

a spreadsheet function that returns the maximum value from a given range that meets a specified condition

MINIFS

a spreadsheet function that returns the minimum value from a given range that meets a specified condition

COUNTIF

a spreadsheet function that returns the number of cells in a range that match a specified value

MIN

a spreadsheet function that returns the smallest numeric value from a range of cells

VLOOKUP

a spreadsheet function that vertically searches for a certain value in a column to return a corresponding piece of information

MATCH

a spreadsheet function used to locate the position of a specific lookup value

VALUE

a spreadsheet functions that converts a text string that represents a number to a numeric value

sort range

a spreadsheet menu function that sorts a specified range and preserves the cells outside the range

sort sheet

a spreadsheet menu function that sorts all data by the ranking of specific sorted column and keep data together across rows

remove duplicates

a spreadsheet tool that automatically searches for and eliminates duplicate entries from a spreadsheet

conditional formatting

a spreadsheet tool that changes hwo cells appear when values meet specific conditions

business metric

a standard of measurement used to solve a business task

report

a static collection of data periodically given to stakeholders

qualitative data

a subjective an explanatory measure of a quality or characteristic

operator

a symbol that names the operation or calculation to be performed

Markdown (R)

a syntax for formatting plain text files

Tidyverse (R)

a system of packages in R with a common design philosophy for data manipulation, exploration, and visualization

programming language

a system of words and symbols used to write instructions that computers follow

ranking

a system to position values of a dataset within a scale of achievement or status

summary table

a table used to summarize statistical information about data

data blending

a tableau method that combines data from multiple data sources

FWF (fixed-width file)

a text file with a specific format, which enables the saving of textual data in an organized fashion

hypothesis

a theory that one might try to prove or disprove with data

data validation

a tool for checking the accuracy and quality of data

SMART methodology

a tool for determining a questions effectiveness based on whether it is specific, measurable, action-oriented, relevant, and time-bound

field length

a tool for determining how many characters can be keyed into a spreadsheet

data model

a tool for organizing data elements and how they relate to one another

dashboard filter

a tool for showing only the data that meets a specific criteria while hiding the rest

Pipe (R)

a tool in R for expressing a sequence of multiple operations, represented with "%>%"

find and replace

a tool that finds a specified search term and replaces it with something else

decision tree

a tool that helps analysts make decisions about critical features of a visualization

legend

a tool that identifies the meaning of various elements in a data visualization

dashboard

a tool that monitors live, incoming data

matrix

a two-dimensional collection of data elements with rows and columns

nominal data

a type of qualitative data that is categorized without a set order

Package (R)

a unit of reproducible R code

unique

a value that cant have a duplicate

list

a vector whose elements can be of any type

channel

a visual aspect or variable that represents characteristics of the data in a visualization

mark

a visual object in a data visualization such as a point, line, or shape

aesthetic (R)

a visual property of an object in a plot

schema

a way of describing how something, such as data, is organized

random sampling

a way of selecting a sample from a population so that every possible type of the sample ahs an equal chance of being chosen

Tibble (R)

a way of standardizing the organization of data within R

tidy data (R)

a way of standardizing the organization of data within R

mutate (R)

an R functions that makes changes to a dataframe separating and merging columns or creating new variables

head() (R)

an R functions that returns a preview of the column names and the first few rows of a dataset

ggplot2 (R)

an R package in Tidyverse that creates a variety of data visualization by applying different properties to the data variables in R

dplyr (R)

an R package in Tidyverse that offers a consistent set of functions to complete common data manipulation tasks

tidyr (R)

an R package in Tidyverse used for data cleaning to make tidy data

readr (R)

an R package in Tidyvese used for importing data

Shiny (R)

an R package used to build interactive web apps with R code

united states census bureau

an agency in the U.S. Department of Commerce that serves as the nations leading provider of quality data about its people and economy

Scope of work (SOW)

an agreed upon outline of the tasks to be performed during a project

merger

an agreement that unites two organizations into a single new one

data type

an attribute that describes a piece of data based on its values, its programming language, or the operations it can perform

digital photo

an electronic or computer-based image usually in BMP or JPG format

C++

an extension of the C programming language that is used to create console games, such as those for Xbox

primary key

an identifier in a database that references a column in which each value is unique

null

an indication that a value does not exist in a dataset

notebook

an interactive, editable programming environment for creating data reports and showcasing data skills

Factor (R)

an object that stores categorical data where the data values are limited and usually based on a finite group, such as country or year

Ruby

an object-oriented programming language for web application development

C#

an object-oriented programming language used to create games and mobile apps in the .NET open source developer platform

CRAN (Comprehensive R Archive Network) (R)

an online archive with R packages, source code, manuals, and documentation

Jupyter Notebook

an open-source web application used to create and share documents that contain live code, equations, visualization, and narrative text

modulo

an operator (%) that returns the remainder when one number is divided by another

logical operator

an operator that returns a logical data type

assignment operator(R)

an operator used to assign values to variable and vectors

relational operator

an operator used to compare values, also known as a comparator

arithmetic operators

an operator used to perform basic math operations such as addition, subtraction, multiplication, and division

World Health Organization

an organization whose primary role is to direct and coordinate international health within the United Nations system

outdated data

any data that has been superseded by newer and more accurate information

duplicate data

any record that inadvertently shares data with another record

networking

building relationships by meeting people both in person and online

professional relationship building

building relationships by meeting people both in person and online

engagement

capturing and holding someone's interest and attention during a data presentation

inline code

code that can be inserted directly into the text of an R Markdown file

open-source

code that is freely available and may be modified and shared by the people who use it

Nested

code that performs a particular function and is contained within code that performs a broader function

data storytelling

communicating the meaning of a dataset with visuals and a narrative that are customized for an audience

naming conventions

consistent guidelines that describe the content, creation date, and version of a file in its name

typecasting

converting data from one type to another

metadata

data about data

second-party data

data collected by a group directly from its audience and then sold

first-party data

data collected by an individual or group using their own resources

structured data

data organized in a certain format such as rows and columns

third-party data

data provided from outside sources who didnt collect it directly

static data

data that doesn't change once it has been recorded

live data

data that is automatically updated

open data

data that is available to the public

incorrect/inaccurate data

data that is complete but inaccurate

clean data

data that is complete, correct, and relevant to the problem being solved

discrete data

data that is counted and has a limited number of values

dirty data

data that is incomplete, incorrect, or irrelevant to the problem to be solved

continuous data

data that is measured and can have almost any numeric value

incomplete data

data that is missing important fields

unstructured data

data that is no organized in any easily identifiable manner

internal data

data that lives within a company's own systems

inconsistent data

data that uses different formats to represent the same thing

dynamic visualizations

data visualizations that are interactive or change over time

audio file

digitized audio storage usually in an MP3, AAC, or other compressed format

Vignette (R)

documentation for a R package that describes the problem the package is designed to solve, explains how its functions can be used, and lists any dependencies on other packages

access control

features such as password protection, user permissions, and encryption that are used to protect a spreadsheet

Anscombe's Quartet

four datasets that have nearly identical summary statistics but contain different plotted values

data design

how information is organized

compatability

how well two or more datasets are able to work together

sample

in data analytics, a segment of a populations that is representative of the entire population

population

in data analytics, all possible values in a dataset

pixel

in digital imaging, a small area of illumination on a display screen that, when combined with other adjacent areas, forms a digital image

argument(R)

information needed by a function in R in order to run

big data

large, complex, datasets typically involving long periods of time, which enable data analysts to address far-reaching business problems

borders

lines that can be added around two or more cells on a spreadsheet

underscores

lines used to underline words and connect text characters

descriptive metadata

metadata that describes a piece of data and can be used to identify it at a later point in time

structural metadata

metadata that indicates how a piece of data is organized and whether it is part of one or more than one data collection

administrative metadata

metadata that indicates the technical source of a digital asset

soft skills

nontechnical traits and behaviors that relate to how people work

data range

numerical values that fall between predefined maximum and minimum values

sampling bias

overrepresenting or underrepresenting certain members of a population as a result of working with a sample that is not representative of the population as a whole

stakeholders

people who invest time and resources into a project and are interested in its outcome

General Data Protection Regulation of the European Union (GDPR)

policy-making body in the european union created to help protect people and their data

data privacy

preserving a data subject's information any time a data transaction occurs

data security

protecting data from unauthorized access or corruption by adopting safety measure

ordinal data

qualitative data with a set order or scale

Analytical skills

qualities and characteristics associated with suing facts to solve problems

reframing

restating a problem or challenge, then redirecting it toward a potential resolution

spotlighting

scanning through data to quickly identify the most important insights

transferable skills

skills and qualities that can transfer from one job or industry to another

small data

small, specific data points typically involving a short period of time, which are useful for making day-to-day decisions

data analyst

someone who collects, transforms, and organizes data in order to draw conclusions, make predictions, and drive informed decision-making

mentor

someone who shares knowledge, skills, and experience to help another grow both professionally and personally

aliasing

temporarily naming a table or column in a query to make it easier to read or write

headline

text at the top of a visualization that communicates the data being presented

label

text in a visualization that identifies a value or describes a scale

annotation

text that briefly explains data or helps focus the audience on a particular aspect of the data in a visualization

alternative text

text that provides an alternative to non-text content, such as images and videos

subtitle

text that supports a headline by adding context and description

technical mindset

the ability to break things down into smaller steps or pieces and work with them in an orderly and logical way

data integrity

the accuracy, completeness, consistency, and trustworthiness of data throughout its life cycle

visual form

the appearance of a data visualization that gives it structure and aesthetic appeal

problem domain

the area of analysis that encompasses every activity affecting or affected by a problem

transaction transparency

the aspect of data ethics that presumes all data processing activities and algorithms should be explainable and under stood by the individual who provides the data

consent

the aspect of data ethics that presumes an individuals right to know how and why their personal data will be used before agreeing to provide it

ownership

the aspect of data ethics that presumes individuals own the raw data they provide and have primary control over its usage, processing, and sharing

currency

the aspect of data ethics that presumes individuals should be aware of financial transactions resulting form the use of their personal data and the scale of those transactions

openness

the aspect of data ethics that promotes the free access, usage, and sharing of data

observation

the attributes that describe a piece of data contained in a row of a table

estimated response rate

the average number of people who typically complete a survey

data analysis

the collection, transformation, and organization, of data in order to draw conclusions, make predictions, and drive informed decsion-making

Context

the condition in which something exists or happens

framework

the context a presentation needs to create logical connections that tie back to the business task and metrics

data constraints

the criteria that determine whether a piece of a data is clean and valid

consistency

the degree to which data is repeatable from different points of entry or collection

validity

the degree to which the data conforms to constraints when it is unput, collected or created

accuracy

the degree to which the data conforms to the actual entity being measured or described

completeness

the degree to which the data contains all desired components or measures

emphasis

the design principle of arranging visual elements to focus the audience's attention on important information in a data visualization

movement

the design principle of arranging visual elements to guide the audiences eyes from one part of a data visualization to another

balance

the design principle of creating aesthetic appeal and clarity in a data visualization by evenly distributing visual elements

rhythym

the design principle of creating movement and flow in a data visualization to engage an audience

rythym

the design principle of creating movement and flow in a data visualization to engage an audience

repetition

the design principle of repeating visual elements to demonstrate meaning in a data visualization

repitition

the design principle of repeating visual elements to demonstrate meaning in a data visualization

variety

the design principle of using different kinds of visual elements in a data visualization to engage an audience

pattern

the design principle of using similar visual elements to demonstrate trends and relationship in a data visualization

proportion

the design principle of using the relative size and arrangement of visual elements to demonstrate information in a data visualization

unity

the design principle of using visual elements that complement each other to create aesthetic appeal and clarity in a data viusalization

pre-attentive attributes

the elements of a data visualization that an audience recognizes automatically without conscious effort

header

the first row in a spreadsheet that labels the type of data in each column

geolocation

the geographical location of a person or device by means of digital information

Geom (R)

the geometric object used to represent data

Data Visualization

the graphical representation of data

x-axis

the horizontal line of a graph usually places at the bottom, which is often used to represent time scales and discrete categories

data strategy

the management of the people, processes, and tools used in data analysis

margin of error

the maximum amount that the sample results are expected to differ from those of the actual population

correlation

the measure of the degree to which two variables change in relationship to each other

story

the narrative of a data presentation that makes it meaningful and interesting

length

the number of characters in a text string

syntax

the predetermined structure of a language that includes all required words, symbols, and punctuation, as well as their proper placement

statistical power

the probability that a test of significance will recognize an effect that is present

statistical significance

the probability that sample results are not due to random chance

sorting

the process of arranging data into a meaningful order to make it easier to understand, analyze, and visualize

data manipulation

the process of changing data to make it more organized and easier to read

data validation process

the process of checking and rechecking the quality of data so that it is complete, accurate, secure, and consistent

aggregation

the process of collecting or gathering many separate pieces into a whole

data composition

the process of combining the individual parts in a visualization and displaying them together as a whole

data merging

the process of combining two or more datasets into a single dataset

relativity

the process of considering observation in relation or proportion to something else

data transfer

the process of copying data from a storage device to computer memory or from one computer to another

Data-inspired decision-making

the process of exploring different data sources to find out what they have in common

data aggregation

the process of gathering data from multiple sources and combining it into a single, summarized collection

computer programming

the process of giving instructions to a computer in order to perform an action or set of actions

analytical thinking

the process of identifying and defining a problem, then solving it by using data in an organized, step by step manner

data mapping

the process of matching fields from one data source to another

Mapping (R)

the process of matching up a specific variable in a dataset with a specific aesthetic

Data anonymization

the process of protecting people's private or sensitive data by eliminating identifying information

structured thinking

the process of recognizing the current problem or situation, organizing available information, revealing gaps and opportunities, and identifying options

filtering

the process of showing only the data that meets a specified criteria while hiding the rest

data replication

the process of storing data in multiple locations

A/B testing

the process of testing two variations of the same web page tp determine which page is more successful at attracting user traffic and generating revenue

coding

the process of writing instructions to a computer in the syntax of a specific programming language

business task

the question or problem data analysis resolves for a business

turnover rate

the rate at which employees voluntarily leave a company

root cause

the reason why a problem occurs

data analytics

the science of data

SELECT

the section of a query that indicates the subset of a dataset

FROM

the section of a query that indicates where the selected data comes from

WHERE

the section of a query that specifies criteria that the requested data must meet

data life cycle

the sequence of stages that data experiences, which include plan, capture, manage, analyze, archive, and destroy

HTML (Hypertext Markup Language)

the set of markup symbols or codes used to create a webpage

data analysis process

the six phases of ask, prepare, process, analyze, share and act whose purpose is to gain insights that drive informed decision-making

statistics

the study of how to collect, analyze, summarize, and present data

experimenter bias

the tendency for different people to observe things differently (also called observer bias)

interpretation bias

the tendency to interpret ambiguous situations in a positive or negative way

confirmation bias

the tendency to search for or interpret information in a way that confirms pre-existing beliefs

revenue

the total amount of income generated by the sale of goods or services

data ecosystem

the various elements that interact with one another in order to produce, manage, store, organize, analyze, and share data

problem types

the various problems that data analysts encounter, including categorizing things, discovering connections, finding patters, identifying themes, making predictions, and spotting something unusual

data-driven decision-making

using facts to guide business strategy

order of operations

using parentheses to group together spreadsheet values in order to clarify the order in which operations should be performed

social media

websites and applications through which users create and share content or participate in social networking

data ethics

well-founded standards of right and wrong that dictate how data is collected, shared, and used

ethics

well-founded standards of right and wrong that prescribe what humans ought to do, usually in terms of rights, obligation, benefits to society, fairness, or specific virtues

data bias

when a preference in favor of or against a person, group of people, or thing systematically skews data analysis results in a certain direction

causation

when an action directly leads to an outcome, such as a cause-effect relationship

redundancy

when the same piece of data is stored in two or more places

unbiased sampling

when the sample of the population being measured is representative of the population as a whole


Related study sets

Strategy implementation, monitoring and control

View Set

Cost of Goods Sold Formula - Manufacturer

View Set

Macroeconomics FTC1- Jobs, Employment, and Output

View Set