B6 M4-M6

Ace your homework & exams now with Quizwiz!

transforming data: cleaning data involves what

- determine the desired output - deduplicate data points, remove inaccurate data, and account for outliers - address missing fields - remove unnecessary attributes - ensure the data is accurate and complete after the cleaning process - remove sensitive information if it is not needed for analysis - split data for analysis - ensure data points are properly formatted

define relational database

allow data to be stored in different tables and the tables are lined through relationships using key fields

relational database concepts: define data dictionary

also referred to as metadata, provides information about the data in a database

transforming data: what are common manipulations of data

appending demographic and socioeconomic data creating new variable that are a function of existing variables creating new variable that classify or categorize existing variables

relational database concepts: define foreign keys

attributes in one table that are also primary keys in another table a primary key in one table and foreign key in another table is what creates a relationship between tables

define data extraction

automated process, semiautomated process or manual extraction

define the 4 parts of Big Data governance

big data confidentiality big data privacy big data ethics governance responsibility

define customer and marketing analytics

build consumer profiles and analyze spending preferences allows organizations to optimize their marketing strategies

define continuous data

can take on any value (including decimal values) within a given (finite or infinite) interval

define ordinal data

categorical and not quantitative but it can be ranked in a meaningful way

relational database concepts: define data types

category of data set or data point ex: numerical or text

what is included into transforming data

cleaning data validating data manipulating data

what are column charts effective at shower

comparisons

define big data

corporate accumulation of massive amounts of data that can be used for analysis, commonly referred to as data analytics

relational database concepts: define database keys

creates relationships within relational databases

what are the different uses of data analytics (6)

customer and marketing analytics managerial and operational analytics risk and compliance analytics financial analytics audit analytics tax analytics

define big data privacy

customer and patient data must be safeguarded from unauthorized access to meet consumer privacy expectations as well as regulatory requirements

loading the data: define the data storage attribute - relationship between elements include validity, completeness, accuracy

data being entered in the correct manner no required data is missing data entered is true and free from errors

what are the two steps in data extraction

data identification obtaining the data

define structured data

defined organizational format that has specific parameters

loading the data: define the data storage attribute - relevance

defining the purpose helps users understand a repository's relevance

define symbol maps

demonstrate data on a geographic map through the use of symbols to help users compare and contrast values

define scatter plot

demonstrate relationship between two variables a simple trendline can be added as a form of simple regression to provide information on correlation

explain geographic maps

demonstrate values on a geographical map and are typically colored or shaded in a manner to signify numeric values

relational database concepts: explain attributes (columns)

describe the characteristics or properties desired to be known about each entity ex: last name

descriptive analytics =

describing or explaining what has occurred backward looking

diagnostic analytics =

diagnosing or explaining why it occurred backward looking

quantitative data =

discrete or continuous

relational database concepts: explain records (rows)

each record contains information about one entity within the table ex: information about a single customer

transforming data: define validating data

ensure data is not lost or inappropriately modified in the cleaning process may be visual review and basic statistical tests may be required (max, min, avgs)

define data management

ensuring that the data is maintained and stored appropriately key for every organization

loading the data: define full refresh loading

entire data set is loaded, replacing the previous load

loading the data: what are the data storage requirements and define them (2)

entity integrity - each table must have a unique primary key as a record identifier referential integrity - a change to a primary key in one table must also cause a change to any related foreign key in a table that is linked

relational database concepts: explain tables

establish columns and tows to store specific types of data records ex: customer table

ETL standards for

extract, transform, and load used for data analytics

what are the 5 dimensions of big data

five Vs of big data 1. volume 2. velocity 3. variety 4. veracity 5. value

define boxplot

graphical displays that show lower and upper extremes, lower and upper quartiles, as well as the medium data point

define semi-structured data

hybrid of unstructured and structured data common example is a CSV file (file has comma-separated values)

to leverage the power of evolving big data, companies must

identify a data point, then capture it, store it, protect it, and eventually dispose of it (if needed)

loading the data: what are the types of loading (3)

initial (full) loading incremental loading full refresh loading

five Vs of big data: define value

insights the big data can yield important to understand the question or business problem that needs to be solved

relational database concepts: define fields

intersection of a column and row the information inside the fields is known as data values

define nominal data

is the simplest form of data that cannot be ordered or ranked

transforming data: define manipulating data

it can be supplemented, enhanced, or otherwise manipulated in a way that adds value to the existing data points

define audit analytics

key to an audit assessing risk providing assurance around certain operations establishing thresholds and expectations improving the quality of the audit by testing full populations

extraction: obtaining the data - explain automated extraction

likely use an application programming interface (API) so extraction is just a matter of a user application accessing the API to obtain the source data

define loading the data

load the data into a software program for analysis or into a data storage location

relational database concepts: what are the different database views

logical database view physical database view

define big data ethics

make sure authorized personnel are granted the minimum level of access to the data necessary to perform their job functions

define flow charts

map out a process that has a beginning and ending steps and a series of steps in between

extraction: obtaining the data - explain manual extraction

may have to use specialized data mining software or write customized queries to obtain the data

define financial analytics

monitor financial performance through data mining and ratio analysis on a continuous basis

define risk and compliance analytics

monitor their transactions through continuous auditing, continuous monitoring, and continuous reporting

loading the data: define data mart

much like a data warehouse but is more focused on a specific purpose such as marketing or logistics and is often a subset of a data warehouse

qualitative data =

nominal and ordinal data

loading the data: define load verification

once the data is loaded into the data repository, it is vital to validate it to ensure no data was lost in the process

loading the data: define incremental loading

only the differences between existing data and new data are added to the data repository

loading the data: what are the different data storages

operational data store (ODS) data warehouse data mart data lake

predictive analytics =

predicting what will occur forward looking forecast future data points by transforming insight into foresight, projecting what will happen based on historical data

prescriptive analytics =

prescribing what could or should occur forward looking how to achieve a desired event

what are the different database keys

primary and foreign

define data analytics

process of taking raw data, identifying trends, and then transforming that knowledge into insights that can help solve complex business problems

what are the types of data

qualitative data quantitative data

five Vs of big data: define volume

quantity or amount of data points may also factor in the size of the data

five Vs of big data: define variety

range of data types being processed of analyzed -structured data -semi structured data -unstructured data

extraction: obtaining the data - explain requesting the data

recipient of the request must be provided with full details on what is needed, including the data file type, format, time period, and required attributes

what is the most efficient and effective methods for storing data

relational database

loading the data: what are the different data storage attributes

relevance elements to be included and excluded relationship between elements include validity, completeness and accuracy

define veracity

reliability, quality, or integrity of data processes should be implemented so that data is cleansed of irregularities, including duplicate fields, missing fields, incorrect formats or characters, transposed fields or incorrect labeling

loading the data: define data lake

repository similar to a data warehouse but it contains both structured and unstructured data, with data mostly being in its natural or raw format

relational database concepts: define physical database view

represents how data is actually physically stored, processed and/or accessed within a database

relational database concepts: define logical database view

represents the type of data that is stored in a database and is intended to explain the contents as well as the logical structure of a database to users

extraction: what are the different ways to obtain data

requesting the data automated extraction manual extraction can be internal or external to the org

relational database concepts: define relationships

result from a link between a primary key in one table and a foreign key in another table

define big data confidentiality and what it includes

safeguarded to protect it from unauthorized access and exploitation -copyrights -patents -trademarks -trade secrets

define governance responsibility for big data

should be lead by a designed individual, like chief privacy officer, corporate compliance officer, or a job role equivalent should have input from leaders across the organization and program should be periodically updated as necessary

define pie charts

show respective proportions of a whole

define waterfall chart

show the cumulative effect of a series of data points that make up a whole

relational database concepts: what are data queries and reports based on

some form of structured query language (SQL)

five Vs of big data: define velocity

speed of data accumulation or data processing

for big data privacy, to maintain compliance organizations must implement

strong governance practices surrounding what type of data can be collected, what disclosures to make as the data is collected, and what controls must be in place to protect that data

define transforming data

taking the often-unstructured raw data, cleaning it, manipulating it, and validating it to ensure it is accurate and ready for analysis

define dot plot

two dimensional mapping of observances onto a coordinate plane

extraction: define data identification

understand the issue the business is trying to address to ensure the data request has the proper scope to resolve it

relational database concepts: define primary key

unique identifies for a specific row within a table and are made up of one or more attributes each row must have a unique primary key ex: social security numbers

define tax analytics

use this to organize tax information and guidelines, improve tax planning, and monitor tax performance indicators

define managerial and operational analytics

usually run in real time to maximize efficiencies and production within an organization

once ETL process has been performed, data analytics can be utilized for a variety of tasks include

validation, planning, insights, risk mitigation, and decision support

when is a stacked column chart effective

very effective when you want to have total comparisons as well as percentage breakdowns of the whole each column is stratified to show additional details

loading the data: define data warehouse

very large data repositories that are centralized and utilized for reporting and analysis rather than for transaction purposes

extraction: data identification involves determining what 3 things

what attributes to analyze time span to use what risks exist in the data

when are line charts best used

when showing quantitative trends over time and can help users discover hidden trends

when is a pyramid most helpful

when the bottom layer represents an action or a target that must first be achieved before the next layer up can take place use for when needing to understand underlying foundations or building blocks

loading the data: define initial (full) loading

when the entire data set is loaded into a repository

loading the data: define the data storage attribute - elements to be included and excluded

which attributes are included outlines the universe of data points housed within a repository

define discrete data

whole numbers and can only have certain values

what does data extraction dictate

will dictate the tools needed for designing the overall process of extraction

loading the data: define operational data store (ODS)

a repository of transactional data from multiple sources and is often a source for data warehouses

define data

a fact, occurrence, instance, or an otherwise measurable observation after organizing raw data, it adds value

define unstructured data

a format that does not have predefined parameters and generally lacks organization


Related study sets

Acid/Base; Fluids and Electrolytes

View Set

Lecture 4 (1/4) Patterns of inheritance (autosomal dominant, recessive,psuedo dominant)

View Set

UNIT 10 - VOCAB TO TALK ABOUT PARTS OF THE BODY AND OBJECTS IN OUR DAILY ROUTINES

View Set

PT II: Therapeutic Exercise Chapter 6 Quiz Questions

View Set