Course 4: Process Data from Dirty to Clean

Ace your homework & exams now with Quizwiz!

CONCAT

A SQL function that adds strings together to create new text strings that can be used as unique keys

CAST

A SQL function that converts data from one datatype to another

COALESCE

A SQL function that returns non-null values in a list

CASE

A SQL statement that returns records that meet conditions by including an if/then statement in a query

Fill handle

A box in the lower-right-hand corner of a selected spreadsheet cell that can be dragged through neighboring cells in order to continue an instruction

Equation

A calculation that involves addition, subtraction, multiplication, or division (also called a math expression)

Cell reference

A cell or a range of cells in a worksheet typically used in formulas and functions

Delimiter

A character that indicates the beginning or end of a data item

Attribute

A characteristic or quality of data used to label a column in a table

Database

A collection of data stored in a computer system

Dataset

A collection of data that can be manipulated or analyzed as one unit

Data

A collection of facts

Bias

A conscious or subconscious preference in favor of or against a person, group of people, or thing

Bad data source

A data source that is not reliable, original, comprehensive, current, and cited (ROCCC)

Boolean data

A data type with only two possible values, usually true or false

CSV (comma-separated values) file

A delimited text file that uses a comma to separate values

Data science

A field of study that uses raw data to create new ways of modeling and understanding the unknown

Changelog

A file containing a chronologically ordered list of modifications made to a project

DISTINCT

A keyword that is added to a SQL SELECT statement to retrieve only non-duplicate entries

Agenda

A list of scheduled appointments

Data element

A piece of information in a dataset

Cloud

A place to keep data online, rather than a computer hard drive

Data governance

A process for ensuring the formal management of a company's data assets

Algorithm

A process or set of rules followed for a specific task

Cross-field validation

A process that ensures certain conditions for multiple data fields are satisfied

Data warehousing specialist

A professional who develops processes and procedures to effectively store and organize data

Data engineer

A professional who transforms data into a useful format for analysis and gives it a reliable infrastructure

Fairness

A quality of data analysis that does not create or reinforce bias

Action-oriented question

A question whose answers lead to change

Confidence interval

A range of values that conveys how likely a statistical estimate reflects the population

Field

A single piece of information from a row or column of a spreadsheet; in a data table, typically a column in the table

Cookie

A small file stored on a computer that contains information about its users

DATEDIF

A spreadsheet function that calculates the number of days, months, or years between two dates

COUNT

A spreadsheet function that counts the number of cells in a range that meet a specified criteria

COUNTA

A spreadsheet function that counts the total number of values within a specified range

CONCATENATE

A spreadsheet function that joins together two or more text strings

AVERAGE

A spreadsheet function that returns an average of the values from a selected range

COUNTIF

A spreadsheet function that returns the number of cells in a range that match a specified value

Conditional formatting

A spreadsheet tool that changes how cells appear when values meet specific conditions

Data validation

A tool for checking the accuracy and quality of data

Field length

A tool for determining how many characters can be keyed into a spreadsheet field

Data model

A tool for organizing data elements and how they relate to one another

Find and replace

A tool that finds a specified search term and replaces it with something else

Dashboard

A tool that monitors live, incoming data

Data type

An attribute that describes a piece of data based on its values, its programming language, or the operations it can perform

Digital photo

An electronic or computer-based image usually in BMP or JPG format

Duplicate data

Any record that inadvertently shares data with another record

Clean data

Data that is complete, correct, and relevant to the problem being solved

Discrete data

Data that is counted and has a limited number of values

Dirty data

Data that is incomplete, incorrect, or irrelevant to the problem to be solved

Continuous data

Data that is measured and can have almost any numeric value

External data

Data that lives, and is generated, outside of an organization

Audio file

Digitized audio storage usually in an MP3, AAC, or other compressed format

Data-inspired decision-making

Exploring different data sources to find out what they have in common

Access control

Features such as password protection, user permissions, and encryption that are used to protect a spreadsheet

Data design

How information is organized

Compatibility

How well two or more datasets are able to work together

Big data

Large, complex datasets typically involving long periods of time, which enable data analysts to address far-reaching business problems

Borders

Lines that can be added around two or more cells on a spreadsheet

Descriptive metadata

Metadata that describes a piece of data and can be used to identify it at a later point in time

Administrative metadata

Metadata that indicates the technical source of a digital asset

Data range

Numerical values that fall between predefined maximum and minimum values

Data privacy

Preserving a data subject's information any time a data transaction occurs

Data security

Protecting data from unauthorized access or corruption by adopting safety measures

Analytical skills

Qualities and characteristics associated with using facts to solve problems

Data analyst

Someone who collects, transforms, and organizes data in order to draw conclusions, make predictions, and drive informed decision-making

Data interoperability

The ability to integrate data from multiple sources and a key factor leading to the successful use of open data among companies and governments

Data integrity

The accuracy, completeness, consistency, and trustworthiness of data throughout its life cycle

Consent

The aspect of data ethics that presumes an individual's right to know how and why their personal data will be used before agreeing to provide it

Currency

The aspect of data ethics that presumes individuals should be aware of financial transactions resulting from the use of their personal data and the scale of those transactions

Estimated response rate

The average number of people who typically complete a survey

Data analysis

The collection, transformation, and organization of data in order to draw conclusions, make predictions, and drive informed decision-making

Context

The condition in which something exists or happens

Data constraints

The criteria that determine whether a piece of a data is clean and valid

Accuracy

The degree to which data conforms to the actual entity being measured or described

Completeness

The degree to which data contains all desired components or measures

Consistency

The degree to which data is repeatable from different points of entry or collection

Data visualization

The graphical representation of data

Data strategy

The management of the people, processes, and tools used in data analysis

Confidence level

The probability that a sample size accurately reflects the greater population

Data manipulation

The process of changing data to make it more organized and easier to read

Data merging

The process of combining two or more datasets into a single dataset

Data transfer

The process of copying data from a storage device to computer memory or from one computer to another

Analytical thinking

The process of identifying and defining a problem, then solving it by using data in an organized, step-by-step manner

Data mapping

The process of matching fields from one data source to another

Data anonymization

The process of protecting people's private or sensitive data by eliminating identifying information

Filtering

The process of showing only the data that meets a specified criteria while hiding the rest

Data replication

The process of storing data in multiple locations

A/B testing

The process of testing two variations of the same web page to determine which page is more successful at attracting user traffic and generating revenue

Business task

The question or problem data analysis answers for a business

Data analytics

The science of data

Data life cycle

The sequence of stages that data experiences, which include plan, capture, manage, analyze, archive, and destroy

Data analysis process

The six phases of ask, prepare, process, analyze, share, and act whose purpose is to gain insights that drive informed decision-making

Experimenter bias

The tendency for different people to observe things differently (Refer to Observer bias)

Confirmation bias

The tendency to search for or interpret information in a way that confirms pre-existing beliefs

Data ecosystem

The various elements that interact with one another in order to produce, manage, store, organize, analyze, and share data

Data-driven decision-making

Using facts to guide business strategy

Data ethics

Well-founded standards of right and wrong that dictate how data is collected, shared, and used

Ethics

Well-founded standards of right and wrong that prescribe what humans ought to do, usually in terms of rights, obligations, benefits to society, fairness, or specific virtues

Data bias

When a preference in favor of or against a person, group of people, or thing systematically skews data analysis results in a certain direction


Related study sets

Lesson 11: Implementing Secure Network Protocols

View Set

Social Studies, Study of Greece - 133 Terms

View Set

Chapter 4: Discounted Cash Valuation

View Set

Unit 3 Production Choice and Behavior Quiz #1

View Set

Fundamentals OVERALL QUESTIONS NEED TO BE REVIEWED

View Set