Data analytics

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Visualization

(Refer to Data visualization)

SQL

(Refer to Structured Query Language)

CONCAT

A SQL function that adds strings together to create new text strings that can be used as unique keys

CAST

A SQL function that converts data from one datatype to another

SUBSTR

A SQL function that extracts a substring from a string variable

COALESCE

A SQL function that returns non-null values in a list

Fill handle

A box in the lower-right-hand corner of a selected spreadsheet cell that can be dragged through neighboring cells in order to continue an instruction

Equation

A calculation that involves addition, subtraction, multiplication, or division (Refer to Math expression)

Math expression

A calculation that involves addition, subtraction, multiplication, or division (also called an equation)

Cell reference

A cell or a range of cells in a worksheet typically used in formulas and functions

Delimiter

A character that indicates the beginning or end of a data item

Attribute

A characteristic or quality of data used to label a column in a table

Pivot chart

A chart created from the fields in a pivot table

Database

A collection of data stored in a computer system

Dataset

A collection of data that can be manipulated or analyzed as one unit

Data

A collection of facts

Video file

A collection of images, audio files, and other data usually encoded in a compressed format such as MP4, MV4, MOV, AVI, or FLV

Record

A collection of related data in a data table, usually synonymous with row

Range

A collection of two or more cells in a spreadsheet

Query language

A computer programming language used to communicate with a database

Structured Query Language

A computer programming language used to communicate with a database

Bias

A conscious or subconscious preference in favor of or against a person, group of people, or thing

Bad data source

A data source that is not reliable, original, comprehensive, current, and cited (ROCCC)

Good data source

A data source that is reliable, original, comprehensive, current, and cited (ROCCC)

Pivot table

A data summarization tool used to sort, reorganize, group, count, total, or average data

Boolean data

A data type with only two possible values, usually true or false

Mandatory

A data value that cannot be left blank or empty

Metadata repository

A database created to store metadata

Normalized database

A database in which only related data is stored in each table

Relational database

A database that contains a series of tables that can be connected to form relationships

Long data

A dataset in which each row is one time point per subject, so each subject has data in multiple rows

Wide data

A dataset in which every data subject has a single row with multiple columns to hold the values of various attributes of the subject

CSV (comma-separated values) file

A delimited text file that uses a comma to separate values

Spreadsheet

A digital worksheet

Data science

A field of study that uses raw data to create new ways of modeling and understanding the unknown

Foreign key

A field within a database table that is a primary key in another table (Refer to primary key)

Return on investment (ROI)

A formula that uses the metrics of investment and profit to evaluate the success of an investment

SUM

A function that adds the values of a selected range of cells

Split

A function that divides text around a specified character and puts each fragment into a new, separate cell

Math function

A function that is used as part of a mathematical formula

TRIM

A function that removes leading, trailing, and repeated spaces in data

MID

A function that returns a segment from the middle of a text string

LEFT

A function that returns a set number of characters from the left side of a text string

RIGHT

A function that returns a set number of characters from the right side of a text string

MAX

A function that returns the largest numeric value from a range of cells

LEN

A function that returns the length of a text string by counting the number of characters it contains

Text string

A group of characters within a cell, most often composed of letters

DISTINCT

A keyword that is added to a SQL SELECT statement to retrieve only non-duplicate entries

Agenda

A list of scheduled appointments

Metric goal

A measurable goal set by a company and evaluated using metrics

Gap analysis

A method for examining and evaluating the current state of a process in order to identify opportunities for improvement in the future

Float

A number that contains a decimal

Data element

A piece of information in a dataset

Cloud

A place to keep data online, rather than a computer hard drive

Function

A preset command that automatically performs a specified process or task using the data in a spreadsheet

Data governance

A process for ensuring the formal management of a company's data assets

Algorithm

A process or set of rules followed for a specific task

Cross-field validation

A process that ensures certain conditions for multiple data fields are satisfied

Hypothesis testing

A process to determine if a survey or experiment has meaningful results

Sponsor

A professional advocate who is committed to moving forward the career of another

Data warehousing specialist

A professional who develops processes and procedures to effectively store and organize data

Data engineer

A professional who transforms data into a useful format for analysis and gives it a reliable infrastructure

Fairness

A quality of data analysis that does not create or reinforce bias

Relevant question

A question that has significance to the problem to be solved

Specific question

A question that is simple, significant, and focused on a single topic or a few closely related ideas

Unfair question

A question that makes assumptions or is difficult to answer honestly

Time-bound question

A question that specifies a timeframe to be studied

Leading question

A question that steers people toward a certain response

Measurable question

A question whose answers can be quantified and assessed

Action-oriented question

A question whose answers lead to change

Confidence interval

A range of values that conveys how likely a statistical estimate reflects the population

Query

A request for data or information from a database

Regular expression (RegEx)

A rule that says the values in a table must match a prescribed pattern

Text data type

A sequence of characters and punctuation that contains textual information (also called string data type)

String data type

A sequence of characters and punctuation that contains textual information (also called text data type)

Formula

A set of instructions used to perform a calculation using the data in a spreadsheet

Field

A single piece of information from a row or column of a spreadsheet; in a data table, typically a column in the table

Metric

A single, quantifiable type of data that is used for measurement

Cookie

A small file stored on a computer that contains information about its users

Quantitative data

A specific and objective measure, such as a number, quantity, or range

DATEDIF

A spreadsheet function that calculates the number of days, months, or years between two dates

COUNT

A spreadsheet function that counts the number of cells in a range that meet a specified criteria

CONCATENATE

A spreadsheet function that joins together two or more text strings

AVERAGE

A spreadsheet function that returns an average of the values from a selected range

COUNTIF

A spreadsheet function that returns the number of cells in a range that match a specified value

MIN

A spreadsheet function that returns the smallest numeric value from a range of cells

VLOOKUP

A spreadsheet function that vertically searches for a certain value in a column to return a corresponding piece of information

Remove duplicates

A spreadsheet tool that automatically searches for and eliminates duplicate entries from a spreadsheet

Conditional formatting

A spreadsheet tool that changes how cells appear when values meet specific conditions

Report

A static collection of data periodically given to stakeholders

Qualitative data

A subjective and explanatory measure of a quality or characteristic

United States Census Bureau

An agency in the U.S. Department of Commerce that serves as the nation's leading provider of quality data about its people and economy

Scope of work (SOW)

An agreed-upon outline of the tasks to be performed during a project

Merger

An agreement that unites two organizations into a single new one

Data type

An attribute that describes a piece of data based on its values, its programming language, or the operations it can perform

Digital photo

An electronic or computer-based image usually in BMP or JPG format

Primary key

An identifier in a database that references a column in which each value is unique (Refer to foreign key)

Null

An indication that a value does not exist in a dataset

Notebook

An interactive, editable programming environment for creating data reports and showcasing data skills

World Health Organization

An organization whose primary role is to direct and coordinate international health within the United Nations system

Outdated data

Any data that has been superseded by newer and more accurate information

Duplicate data

Any record that inadvertently shares data with another record

Networking

Building relationships by meeting people both in person and online

Naming conventions

Consistent guidelines that describe the content, creation date, and version of a file in its name

Typecasting

Converting data from one type to another

Metadata

Data about data

Second-party data

Data collected by a group directly from its audience and then sold

First-party data

Data collected by an individual or group using their own resources

Structured data

Data organized in a certain format such as rows and columns

Third-party data

Data provided from outside sources who didn't collect it directly

Open data

Data that is available to the public

Incorrect/inaccurate data

Data that is complete but inaccurate

Clean data

Data that is complete, correct, and relevant to the problem being solved

Discrete data

Data that is counted and has a limited number of values

Dirty data

Data that is incomplete, incorrect, or irrelevant to the problem to be solved

Continuous data

Data that is measured and can have almost any numeric value

Incomplete data

Data that is missing important fields

Unstructured data

Data that is not organized in any easily identifiable manner

Internal data

Data that lives within a company's own systems

External data

Data that lives, and is generated, outside of an organization

Inconsistent data

Data that uses different formats to represent the same thing

Audio file

Digitized audio storage usually in an MP3, AAC, or other compressed format

Data-inspired decision-making

Exploring different data sources to find out what they have in common

Access control

Features such as password protection, user permissions, and encryption that are used to protect a spreadsheet

Data design

How information is organized

Compatibility

How well two or more datasets are able to work together

Sample

In data analytics, a segment of a population that is representative of the entire population

Population

In data analytics, all possible data values in a dataset

Pixel

In digital imaging, a small area of illumination on a display screen that, when combined with other adjacent areas, forms a digital image

Big data

Large, complex datasets typically involving long periods of time, which enable data analysts to address far-reaching business problems

Borders

Lines that can be added around two or more cells on a spreadsheet

Descriptive metadata

Metadata that describes a piece of data and can be used to identify it at a later point in time

Structural metadata

Metadata that indicates how a piece of data is organized and whether it is part of one or more than one data collection

Administrative metadata

Metadata that indicates the technical source of a digital asset

Data range

Numerical values that fall between predefined maximum and minimum values

Sampling bias

Overrepresenting or underrepresenting certain members of a population as a result of working with a sample that is not representative of the population as a whole

Stakeholders

People who invest time and resources into a project and are interested in its outcome

General Data Protection Regulation of the European Union (GDPR)

Policy-making body in the European Union created to help protect people and their data

Data privacy

Preserving a data subject's information any time a data transaction occurs

Data security

Protecting data from unauthorized access or corruption by adopting safety measures

Ordinal data

Qualitative data with a set order or scale

Analytical skills

Qualities and characteristics associated with using facts to solve problems

Small data

Small, specific data points typically involving a short period of time, which are useful for making day-to-day decisions

Data analyst

Someone who collects, transforms, and organizes data in order to draw conclusions, make predictions, and drive informed decision-making

Mentor

Someone who shares knowledge, skills, and experience to help another grow both professionally and personally

Technical mindset

The ability to break things down into smaller steps or pieces and work with them in an orderly and logical way

Data interoperability

The ability to integrate data from multiple sources and a key factor leading to the successful use of open data among companies and governments

Data integrity

The accuracy, completeness, consistency, and trustworthiness of data throughout its life cycle

Problem domain

The area of analysis that encompasses every activity affecting or affected by a problem

Transaction transparency

The aspect of data ethics that presumes all data-processing activities and algorithms should be explainable and understood by the individual who provides the data

Consent

The aspect of data ethics that presumes an individual's right to know how and why their personal data will be used before agreeing to provide it

Ownership

The aspect of data ethics that presumes individuals own the raw data they provide and have primary control over its usage, processing, and sharing

Currency

The aspect of data ethics that presumes individuals should be aware of financial transactions resulting from the use of their personal data and the scale of those transactions

Openness

The aspect of data ethics that promotes the free access, usage, and sharing of data

Observation

The attributes that describe a piece of data contained in a row of a table

Estimated response rate

The average number of people who typically complete a survey

Data analysis

The collection, transformation, and organization of data in order to draw conclusions, make predictions, and drive informed decision-making

Context

The condition in which something exists or happens

Data constraints

The criteria that determine whether a piece of a data is clean and valid

Consistency

The degree to which data is repeatable from different points of entry or collection

Validity

The degree to which the data conforms to constraints when it is input, collected, or created

Accuracy

The degree to which the data conforms to the actual entity being measured or described

Completeness

The degree to which the data contains all desired components or measures

Header

The first row in a spreadsheet that labels the type of data in each column

Geolocation

The geographical location of a person or device by means of digital information

Data visualization

The graphical representation of data

Data strategy

The management of the people, processes, and tools used in data analysis

Margin of error

The maximum amount that the sample results are expected to differ from those of the actual population

Length

The number of characters in a text string

Syntax

The predetermined structure of a language that includes all required words, symbols, and punctuation, as well as their proper placement

Confidence level

The probability that a sample size accurately reflects the greater population

Statistical power

The probability that a test of significance will recognize an effect that is present

Statistical significance

The probability that sample results are not due to random chance

Sorting

The process of arranging data into a meaningful order to make it easier to understand, analyze, and visualize

Data manipulation

The process of changing data to make it more organized and easier to read

Data merging

The process of combining two or more datasets into a single dataset

Data transfer

The process of copying data from a storage device to computer memory or from one computer to another

Analytical thinking

The process of identifying and defining a problem, then solving it by using data in an organized, step-by-step manner

Data mapping

The process of matching fields from one data source to another

Data anonymization

The process of protecting people's private or sensitive data by eliminating identifying information

Structured thinking

The process of recognizing the current problem or situation, organizing available information, revealing gaps and opportunities, and identifying options

Reframing

The process of restating a problem or challenge, then redirecting it toward a potential resolution

Filtering

The process of showing only the data that meets a specified criteria while hiding the rest

Data replication

The process of storing data in multiple locations

A/B testing

The process of testing two variations of the same web page to determine which page is more successful at attracting user traffic and generating revenue

Business task

The question or problem data analysis resolves for a business

Turnover rate

The rate at which employees voluntarily leave a company

Root cause

The reason why a problem occurs

Data analytics

The science of data

SELECT

The section of a query that indicates from which column(s) to extract the data

FROM

The section of a query that indicates from which table(s) to extract the data

WHERE

The section of a query that specifies criteria that the requested data must meet

Data life cycle

The sequence of stages that data experiences, which include plan, capture, manage, analyze, archive, and destroy

Data analysis process

The six phases of ask, prepare, process, analyze, share, and act whose purpose is to gain insights that drive informed decision-making

Experimenter bias

The tendency for different people to observe things differently (Refer to Observer bias)

Observer bias

The tendency for different people to observe things differently (also called experimenter bias)

Interpretation bias

The tendency to interpret ambiguous situations in a positive or negative way

Confirmation bias

The tendency to search for or interpret information in a way that confirms pre-existing beliefs

Revenue

The total amount of income generated by the sale of goods or services

Data ecosystem

The various elements that interact with one another in order to produce, manage, store, organize, analyze, and share data

Problem types

The various problems that data analysts encounter, including categorizing things, discovering connections, finding patterns, identifying themes, making predictions, and spotting something unusual

Data-driven decision-making

Using facts to guide business strategy

Order of operations

Using parentheses to group together spreadsheet values in order to clarify the order in which operations should be performed

Social media

Websites and applications through which users create and share content or participate in social networking

Data ethics

Well-founded standards of right and wrong that dictate how data is collected, shared, and used

Ethics

Well-founded standards of right and wrong that prescribe what humans ought to do, usually in terms of rights, obligations, benefits to society, fairness, or specific virtues

Data bias

When a preference in favor of or against a person, group of people, or thing systematically skews data analysis results in a certain direction

Redundancy

When the same piece of data is stored in two or more places

Unbiased sampling

When the sample of the population being measured is representative of the population as a whole

Random sampling

A way of selecting a sample from a population so that every possible type of the sample has an equal chance of being chosen

Substring

A subset of a text string

Operator

A symbol that names the operation or calculation to be performed

Data validation

A tool for checking the accuracy and quality of data

SMART methodology

A tool for determining a question's effectiveness based on whether it is specific, measurable, action-oriented, relevant, and time-bound

Field length

A tool for determining how many characters can be keyed into a spreadsheet field

Data model

A tool for organizing data elements and how they relate to one another

Dashboard

A tool that monitors live, incoming data

Nominal data

A type of qualitative data that is categorized without a set order

Unique

A value that can't have a duplicate

Schema

A way of describing how something, such as data, is organized


Kaugnay na mga set ng pag-aaral

Chapter 17 Plate Tectonics Quiz (Earth Science)

View Set

Chapter 12- Nervous System Physiology Homework

View Set

Protons, neutrons, electrons unit

View Set

A&P Final Mastering questions (Heart)

View Set

MTA 98-366 Networking Fundamentals Lesson 4 Practice Questions

View Set

Behavioral Genetics and Epigenetics

View Set