B6: M4

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Data storage attributes

-Relevance: define purpose helps users understand a repository relevance -Elements to Be Included and Excluded -Relationship Between Elements Include Validity, Completeness, and Accuracy: ----> validity: data entered in right manner (ex:units for measure of length all in inches) ---->: completeness: no required data missing ---> : accuracy: true and free from errors

Best practices for data visualization: -scale appropriately: x axis and y axis should not be misleading. scaling should start at ___ -use legends appropriately... ? -avoid bias -use consistent time periods -use colors that can be easily seen and follow cultural norms -use clear and easy to read titles and labels

0 if more than 4-5 colors... avoid using legend! could be difficult to follow

Data storage requirements

1) Entity integrity - each table must have unique primary key as record identifier 2) Referential integrity -change to primary key in one table must also cause change to any related foreign key in a table that is linked

Database views are ways in which a database, its contents, and or structure can be depicted. Views are broken into two broad types... what are they?

1) Logical database view -type of data that is stored in database and intended to explain the contents as well as logical structure of a database to users 2) Physical database view -physical view represents how data is actually physically stored, processed, and or accessed within a database

Using data analytics: -four types of audit analytics

1) assessing risk 2) providing assurance around certain operations 3) establishing thresholds and expectations 4) improving the quality of the aufit by testing full populations

ETL: Types of loading

1) initial (full loading) -when entire data set loaded into repository -initial: data being loaded if no prior iterations in repo 2) incremental loading: -only differences between existing and new data added to repository 3) full refresh loading -entire data set loaded, replacing previous load

Types of data? categories within?

1) qualitative - nonmuerical and categorical -Nominal: simplest form, cannot be ordered or ranked -Ordinal: categorical and not quantitative, but can be ranked in a meaninful way... example (higj, low, medium) 2) quantitative- numerical -Discrete: who numbers and can only certain values -Continuous : any value (decimal too), within a given (finite or infinite) interval

Data Extraction: -first step? -second step?

1) understand the issue the business is trying to address to ensure the data request has the proper scope to resolve it -determine which attributes to analyze, time span to use, and what risks exist in data 2) Obtain the data -request data then automated or manual extraction

Intellectual property is considered what kind of data? What are the 4 most common types of intellectual property

Confidential Copyrights: orginal work of authorship Patents: protection for invention that is unique in design or utility trademarks: words, symbols, phrases, design, combo trade secrets: competitive advantage or commercial value in way other companies do not know

What acts as unique identifiers and create relationships within relational database? Types?

Keys 1) Primary keys - UNIQUE for specific row within table and make up of one or more attributes -example: SSN 2) Foreign key -attributes in one table that are also primary keys in another table

What is vital after loading data?

Load verification -validate and make sure no data lost in process

Big Data Ethics -Organizations should make sure authorized personnel are granted the ____ level of access to data necessary to perform their job functions -includes read, create, edit, and elete capabilities -eliminate bias

MINIMUM

Different places you can store data?

Operational data store (ODS) Data warehouse Data mart Data Lak

What do line charts show?

Quantitative trends over time, help users discover hidden trends

Within an organization, data can be stored in a variety of ways. However, one of the most efficient and effective methods for many use cases is to store data where? Explain?

Relational database allow data to be stored in different tables... the tables are linked through relationships using key fields

Relational database... how do relationships form?

Result from link between primary key in one table and foreign key in another table link relates the two tables, enabling users to simultaneously retrieve info from both tables

Extracting data is typically done via query tools, most COMMONLY using programming languages that are based on some form of? Commands?

Structured query language (SQL) SELECT, FROM, WHERE

Relational Database -what are the tables? -what are the columns? -what are the rows? -what are fields? -data types?

Tables: -organizational structures within relational database that establish columns and rows to store specific types of data records Columns: ATTRIBUTES: - describe the characteristics of properties desired to be known about each entity Rows: RECORDS -contains info about one entity within the table example: Customer table would provide certain info about a single customer Fields: -intersection of column and row - data is entered data types: -category of data set or data point -numerical, boolean, yes/no, true false

Data lake

a repository similar to a data warehouse, but it contains both structured and unstructured data, with data mostly being in its natural or raw format

When can data analytics take place?

after ETL process data analytics is the process of taking raw data, identifying trends, and transforming that knowledge into insights that can help solve complex business problems

At first data can lack meaning. when does is it acquire additional value?

after organization, transformation, and further processing

Data Extraction: -what kind of process? -the native source and means of acessing the data must be determined in the iinitial ETL setup phase. this will dictate?

automated, semi-automated, or manual dictate the tools needed for designing the overall process of extraction

Big Data Privacy: -customer and patient data must be safeguarded to meet ___ and ___? -privacy rights traced to which ammendment? -organizations must implement strong governance practices surronding what type of data can be collected, what disclosure to make, and what controls in place to protect

consumer privacy expectations and regulatory requirements 4th ammendment

Data analytics: customer and marketing analytics -build ____ and analyze ___ to optimize marketing strategies

consumer profiles, spending preferences

Big Data?

corporate accumulation of massive amounts of data that can be used for analysis, commonly referred to as data analytics

waterfall chart show?

cumulative effect of a series of data points that make up a wjp;e

Transformation: Validating Data -what does it ensure? -what kinnd of reviews?

data is not lost or inappropriately modified during cleaning process visual review for simple steps, but larfe may need basic statistical tests

Using data analytics: -financial analytics: monitor financial perfomance through __ and __ on continious basis

data mining, ratio analysis

An organizations governance program should be led by...? -design of program should have? -program must be ____

designated indiividual, such as chief privacy officer, corporate compliance officer, or job role that is equivalent should have input of leaders across organization periodically updated as necessary

What do column charts show? (Bar chart)

effective at showing comparisns - easily show highest and lowest

Data -definition -forms?

fact, occurence, instance, or an otherwise measurable observation forms: numerical digits, alphanumeric text, images, video, and audio recordings

pyramid -understanding underlying ___ -most helpful when bottom layer represents an action or target that...

foundations action or target that must first be achieved before the next layer can take place

symbol map - what is it? helps users?

geographic map use of symbols help compare and contrast values

How to leverage power of data?

identify a data point, capture it, store, protect it, eventually dispose if appropriate

FIVE DIMENSIONS OF BIG DATA: VALUE -refers to ___ Big Data can yield -important to understand? why?

insights understand the question or business problem needs to be soldved not all data can be translated into actionable insights

directional charts show?

key events or milestones, earliest data on left latest on right

Boxplots show?

lower and upper extremes, upper quartilies and lower quartiles, median , outlier 1

Data dictionary -aka? -what is it?

metadata provides info about data in database list each attribute and denotes the features and limitations of attribute

Data mart

much like a data warehouse,BUT is more focused on specific purpose such as marketing or logstics subset of data warehouse

what are the attributes that are not primary keys or foreign keys called

non key/descriptive attributes

Most important capabilities needed in visualization tools to support modeling and analysis?

promote versatility in using data and allow multiple types of visualizations to be created using one data set

FIVE DIMENSIONS OF BIG DATA: VOLUME -represents? -may also factor in the size of data in terms of ____

quantity or amount of data points storage

FIVE DIMENSIONS OF BIG DATA: VARIETY -references? -types? and example of it

range of data types being processed or analyzied 1) STRUCTURED DATA -defined organizational format that has specific paramaters (ex: numerical or alphabetical figures only) -example: relational database 2) SEMI-STRUCTURED DATA: -hybrid of structured and unstructured. -example CSV... no restriction (unstructure) on size or length of data points, but each data is denoted separately by comma (structure) 3) UNSTRUCTURED DATA -format that does not have predefined paramaters -generally lacks organization -example: review post of product online, audio, video, images

Using data anlytics: -managerial and operational analytics: run in ____ to maximize efficiencies and production and within an organization

real time

Scatter plots show?

relationships between two variables... simple regression to provide info on correlation

FIVE DIMENSIONS OF BIG DATA: VERACITY -represents ___, ___, or ___ of data. -high quality means? -this means?

reliability, quality, integrity high quality: accurate and timely means processes should be implemented so that data is cleansed of irregularities, including duplicate fields, missing fields, incorrect, transposed, or incorrect labeling

Operational data store (ODS)

repository of transactional data from multiple sources and is often a source for data warehouses

pie charts show?

respective proportions of a whole value proportional breakdown

Stacked column chart show? effective when?

similar to column, but eahch column stratified to show additional details effective when you want to have total comparions as well as % breakdowns of whole

ETL: Loading -final step of process tgat loads data into ___ or ___ -main concern... has it been extracted and transformed into format that is...

software program for analysis or data storage location compatible with software program or destination

Manual extracting data... use?

specialized data mining software or write customized queries to obtain data

FIVE DIMENSIONS OF BIG DATA: VELOCITY -refers to? -what has higher velocity than others?

the speed of data accumulation or data processing continuous basis (more frequently)

what is one of the most time consuming processes of ETL process? Steps of doing this>

transformation step. Clean, validate, manipulate

Dot plots show?

two dimension mapping of observances into coordinate place, shows frequency of observations of other dimension

Transformation: manipulating data -after being cleaned and validated, it can be supplemated, enhanced, or otherwise manipulated in a way that adds __ to existing data points -what are example of common manupulations ?

value appending demographic and socioeconomic data creating new variables that are function of existing (variable A x C) creating new variables that classify or categorize existing variables (group customer in given location into geographical categories)

Data warehouse -definition -pulls data from?

very large data repositories that are centralized nad used for reporting and analysis rather than transaction purposes . PREDEFINED SCHEMAN... ENABLE QUICK PROCESSING AND ANALYSIS directly from enterprise systems with transactional data or ODS

Automated Extraction -will highly use what?

API: application programming interface -extraction is just a matter of user application accessing the API to obtain the source data -two applications communicate with each other

Common prescriptive analuytic techniques use decision support systems to assist in strategic decision making as follows....

Artificial intelligence scenario modeling

What addresses the challenges that come with big data - such as ethical, legal, employee, customer, and stakeholder concerns?

Big Data Governance program and policies -provide guidance on how sensitive data should be captured, maintained, and disposed of during its life cycle within the company's possesion

Types of data anlytics: -for each type, list, define, and explain value/complexity/ and whether forward or backward looking

Descriptive analytics: -describing or explaining WHAT has occured within a given attribute or attributes -least valuable and complex -backward looking Diagnostic analytics: -diagnosing or explaining WHY it occurred -second least complex -backward looking Predictive analytics: -predicting what will occur, forecast based on historical data -second most complex -forward looking Prescriptive analytics: -prescribing what COULD or what SHOULD occur -next course of action to achieve outcome -most complex -forward looking

What is ETL? (abbreviation) And what is it the process of?

Extract, Transform , Load process of which data is captured from its source and transferred to an organizations custody so that it can then be further analyzed

Five dimensions of Big Data

FIVE VS: Volume Velocity Variety Veracity Value

Extracting Data: -If submitting request for data, what must happen?

The recipient of the request must be provided with full details on what is needed, including data file type, format, time period and required attributes


Kaugnay na mga set ng pag-aaral

Chapter 15 Sensory Pathways and the Somatic Nervous System

View Set

Anthem Part D Plans 2022 PARTD22

View Set

Chapter 8Fedeal tax considerations for life insurance and annuties

View Set

U.S Government - Unit One: The History of Civics and Government

View Set

Ch 16 - Child & Adolescent Health

View Set