Data Vocab - Exam 1

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Data analytics

A means of searching through large databases to make predictions or identify patterns or trends.

Relational databases

A means of storing data that gets rid of redundancies, and enforces business rules (such as referential integrity)

Value correlation

Checking collections of values against a rule that must hold true over the data

Data timeliness

Problems related to this include data that may eventually be accurate, but are not available when needed

Data accuracy for records

Problems related to this include missing records , incorrect records, old records.

Classification

Process to assign data to categories; helps with prediction

A well-chosen average

When an the word "average" is used to describe something with no indication of whether it is actually a mean, median, or mode

Sampling bias

When not every item in the population has an equal chance to be chosen for the sample

Data quality

What data has when it satisfies the requirements of its intended use

Missing value

A data element that has no value in it that can either be accurate or inaccurate is a ___

Co-occurrence grouping

A data approach meant to identify associates among individuals based on transactions that involve them.

Value inspection

A method to identify the presence of inaccuracies through visual inspection when it is not possible to create a clear rule that defines the boundary between right and wrong

Composite/concatenated primary key

A primary key composed of one or more data elements; this key is the primary key of a bridge table

Clustering

A process to group individuals based on characteristics or attributes

Data request form

A request for data that you do not have direct access to

Regression

A statistical method that predicts a variable based on the influence of one or more other variables.

Nominal data

A type of qualitative data that you can count, group, and take a proportion

Dependent variable

A variable in regression analysis that is predicted based on the predictor variable(s)

Independent variable

A variable that predicts or explains another variable, used in regression analysis

Degree of significance

A way to determine if an inadequate sample is being used to make a claim

Data reduction

Allows the user to focus on the most critical items in a large set of data

Profiling

An attempt to identify "typical" behavior by generating summary statistics about the data such as mean and standard deviation

Similarity matching

An attempt to identify similar individuals based on known data

Link prediction

An attempt to predict a relationship between two data items

Similarity matching

Approach that tries to identify similar individuals based on known data

Qualitative data

Categorical data that can only be counted and grouped, although sometimes the data can be ranked

The gee whiz graph

Changing the scale, proportion, or eliminating lower values to make something look better or more significant than it really is

Structural analysis

Checking fields for uniqueness or consecutiveness, checking for orphans on collections of records, checking for circular references

Costs described in a data quality program business case

Costs of rework, cost of lost customers, cost of late reporting, cost of incorrect decisions. Other costs include implementing large projects without understanding the underlying databases or taking too long to complete the project.

Identified costs

Costs that are already known as costs for a data quality project

Non-key data elements/descriptive attributes

Data elements that are included in a table that are neither primary keys nor foreign keys

Discrete data

Data represented by whole numbers; e.g., an Astros game score

Ordinal data

Data that can be counted and categorized, and the categories can be ranked; e.g., gold, silver, and bronze medals

Interval data

Data that can be counted and grouped like qualitative data, and the differences between each data point are meaningful

Structured v unstructured data

Data that can be used in a relational database v data such as video that cannot be readily adapted for use in a relational database

Continuous data

Data that can take on a any value within a range; e.g., height

Normal distribution

Distribution of data where the mean, median, and mode are all equal; half of the observations fall below the mean and half are above the mean

Mastering the data

Identifying and determining what data is needed for answering the questions identified in a data analytic project

Steps in the IMPACT cycle.

Identifying the questions to be answered, mastering the data, performing the test plan, addressing and refining results, communicating insights, and tracking outcomes

Potential for future costs included in a business case

Identifying what could happen if certain data elements contain inaccurate information

An example of "post hoc fallacy"

If A implies B, then B must imply A

Ratio data

In addition to being able to be counted, grouped and the intervals between data points are meaningful, the data point of zero has meaning; for example, zero has a meaning of "the absence of"

Quantitative data

In these types of data, the intervals between data points are meaningful so that means, medians, and modes can be calculated

Data dictionary

Includes descriptions of all the data attributes (e.g., range, domain, numeric v. alphanumeric, etc.)

Element analysis

Looking at individual values in isolation to determine if they are valid

Reverification

Manually going back to the original source of the information to check every value

Statisticulation

Misinforming people by the use of statistics

The argument for a data quality program

Placing your company in the best position to rapidly respond to changes in the business requires having and maintaining highly accurate and metadata in the company's information systems

Data accuracy for data values.

Refers to whether data values stored for an object are the correct values

Spurious precision

Saying people sleep an average of 6.71 hours a night without taking into account that most people will miss their guess by a quarter hour or more

Flat file

Storing data in one place (such as an Excel spreadsheet), rather than in multiple tables

Change-induced inconsistencies

System changes that change the way or granularity of information describe ____

Assessment project

The part of the project that is all cost and no value unless it is implemented

Extracting, transforming, and loading of data

The process of mastering the data in a data analytic project

Primary key

The unique identifier for each record in a table; it is typically alphanumeric code

Valid value

The value is in the collection of possible accurate values and is represented in a consistent and unambiguous way

Trusted data

This issue is related to problems with an application that leads to misinformation, leading to a lack of believability

Value representation consistency

Two values can be both correct and unambiguous and still cause problems can be caused by ____

Foreign key

Typically the primary key in one table that is used to link information from that table to another table.

Statistical error/standard error

Used to determine how accurately your sample can be taken to represent the population

Business case

Used to present the difference between the gains and costs of a project, for example a case for a data quality program

Data relevance

Without this information specifically related to the use of the data, the data has a low level of quality


Set pelajaran terkait

Chapter 13 Real Estate Principles

View Set

CHS 712- Epidemiology in Public Health Quizzes 1-12

View Set

Corporate Finance Midterm Review

View Set

Chapter 14 overview of shock and sepsis

View Set

Psych notes - correct answers ch1-9

View Set

สูตรคูณแม่8, สูตรคูณแม่ 7, สูตรคูณแม่ 3

View Set