Unit 3 CIS

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Dirty data

is erroneous or flawed data. The complete removal of dirty data from a source is impractical or virtually impossible.

Internal databases

the parts that make up the company on the inside. Like the Marketing, Sales, Inventory, and Billing departments.

external databases

the parts that make up the company on the outside. Like the competitor information, industry information, mailing lists, and stock market analysis.

Appropriate methods for assigning variables in R

• X <- 42 • 42 -> X "value goes INTO variable" • X = 42

Metadata

provides details about data. -For example, metadata for an image could include its size, resolution, and date created

Attributes

(also called columns or fields) are the data elements associated with an entity

entity

(also referred to as a table) stores information about a person, place, thing, transaction, or event.

double-spend

- Scenario, in the Bitcoin network, where someone tries to send a bitcoin transaction to two different recipients at the same time. - However, once a bitcoin transaction is confirmed, it makes it nearly impossible to double spend it. - The more confirmations that a particular transaction has, the harder it becomes to double spend the bitcoins.

Infrastructure as a Service (IaaS)

- delivers hardware networking capabilities, including the use of servers, networking, and storage, over the cloud using a pay-per-use revenue model - the customer rents the hardware and provides its own custom applications or programs. IaaS customers save money by not having to spend a large amount of capital purchasing expensive servers,

blockchain

- is a distributed ledger that provides a way for information to be recorded and shared by a community. - The blockchain serves as a historical record of all transactions that ever occurred, from the genesis block to the latest block -it contains data, hash, and the hash of the previous block

Bitcoin

- uses encryption to maintain the integrity of transactions, which is why it is called a cryptocurrency. - peer-to-peer; every user is allowed to connect to the network, send new transactions to it, verify transactions, and create new blocks.

Advantages of Blockchains

-Distributed - it works as a shared form of record-keeping ensuring no one personor organization holds ownership - Permission -Everyone in the process has a copy of every record and piece of data and no transaction can be added to the chain without consensus across the participants, which means no one person can add to or alter the blockchain without being permanently recorded making a tamper resistant and highly secure -Secure - This process eliminating the risk of fraud and error no one not even a system administrator can delete it

information scrubbing

A process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information

boolean

A single value of either TRUE or FALSE

system restore

A utility in Windows that restores system settings to a specific previous date when everything was working properly.

Systems thinking

A way of monitoring the entire system by viewing multiple inputs being processed or transformed to produce outputs while continuously gathering feedback on each part.

cryptocurrency

Bitcoin uses encryption to maintain the integrity of transaction

Hash and Digital Signature (thus the label 'crypto-currency')

Computer Science and advanced mathematics (in the form of cryptographic hash functions) protect the blockchain's integrity and anonymity. Each transaction has a digital hash calculated and attached. The hash includes digital signatures from the existing blockchain as well as the new transaction. In this way each block confirms the integrity of all prior blocks and thereby prevents falsification or manipulation of prior transactions.

data analytics

The science of examining raw data with the purpose of drawing conclusions about that information

The central components of blockchain are

Digital Ledger, Hash, Miners, Decentralized, Double spend, and peer-to-peer.

Decentralization (decentralized)

Each node in the participating computer network has a full copy of the digital ledger.

site license

Enables any qualified users within the organization to install the software, regardless of whether the computer is on a network. Some employees might install the software on a home computer for working remotely.

feed-forward loop

FOCUSED ON REGULATING INPUT. Monitor input variation and then adjust the process to compensate • Feedforward control measures one or more inputs of a process, calculates the required value of the other inputs and then adjusts it. • Feedforward control has to predict the output as it does not measure output. So, it is sometimes called as PREDICTIVE CONTROL. • Feedforward control does not check how the adjustments of inputs worked in the process. So, it is referred to as OPEN LOOP CONTROL. About changing things in a marginal sense constantly as we go, very hands-on intensive (I don't think this one is as much about the creation of a system as much as a creation of a recipe). System Management (regulating the process) within System Design (rearranging or replacing tools) within Global System Design (integrating with fellow citizens)

feedback loop

FOCUSED ON REGULATING PROCESS. At the end of (either each or the last) steps you observe the output and adjust the process as required. • Feedback control measures the output of a process, calculates the error in the process and then adjusts one or more inputs to get the desired output value. • The feedback control reacts only to the process error (the deviation between the measured output value and set point). So, it is called as REACTIVE CONTROL. • Feedback control measures the output and verifies the adjustment results. So, it is called as CLOSED LOOP CONTROL About setting the process of and the doubling back and changing how we did things.

Programming Functions, in R

Functions are sometimes also called procedures. Using food and meals as an analogy, a recipe for baking chocolate chip cookies would be its algorithm.

Structured Query Language (SQL)

MIS professionals have the skills required to code SQL. code to answer questions against a database Most common SQL command begins with the word SELECT.

query-by-example (QBE)

Managers, no coding required. Helps graphically design the answer to a question

Miners

Transactions are authenticated by a network of 'miners' who complete complex mathematical problems.

Lists

Vector (multiple elements all together): use (1,2) to create a list where you can interact with each part individually at the same time.

foreign key

a primary key of one table that appears as an attribute in another table. Used to connect the separate tables.

system

a set of components that interact to achieve some purpose or goal

DBMS models for organizing information

hierarchical, network, and the relational database

Multitasking

allows more than one piece of software to be used at a time

Advantages of bitcoin

any well-connected node in the bitcoin blockchain can determine, with certainty, whether a transaction does or does not exist in the data set. & A prohibitively high cost to attempt to rewrite or alter transaction history.

Distributing Application Software

application software can be distributed these ways: • Single user license- restricts use to one user at a time. • Network user license- anyone on network can install and use. • Site license- anyone in organization can install software (not just on network). • Application service provider license- some sort of pay as you go/use. (specialty adjusting use-case where they pay per use or per download or per license)

Software as a Service (SaaS)

applications over the cloud using a pay-per-use revenue model.

intangible benefits

are difficult to quantify or measure.

Tangible benefits

are easy to quantify and typically measured to determine the success or failure of a project.

Data models

are logical data structures that detail the relationships among data elements using graphics or pictures

Algorithms

are mathematical formulas placed in software that performs an analysis on a data set.

integrity constraints

are rules that help ensure the quality of information (keep data trustworthy). There are 2 types of integrity constraints: 1 relational 2 business critical.

redundant data

can cause storage issues along with data integrity issues, making it difficult to determine which values are the most current or most accurate.

comparative analysis

can compare two or more data sets to identify patterns and trends.

data dictionary

collection of the metatdata. EX: the references and timestamps of edits at bottom of a Wikipedia page

processing

computer program that processes the data Ex: Cook the patty-put the ingredients together

data mart

contains a subset of data warehouse information.

input

data that is entered in a computer Ex: Getting lettuce, tomatoes, patty, bun, ketchup

information integrity

dependability and trustworthiness of information. More specifically, it is the accuracy, consistency and reliability of the information content, processes and systems

Platform as as Service (PaaS)

deployment of entire systems, including hardware, networking, and applications, using a pay-per-use revenue model. Every aspect of development, including the software needed to create it and the hardware to run it, lives in the cloud.

network user license

enables anyone on the network to install and use the software

peer-to-peer

every user is allowed to connect to the network, send new transactions to it, verify transactions, and create new blocks

data scientist

extracts knowledge from data by performing statistical analysis, data mining, and advanced analytics on big data to identify trends, market changes, and other relevant information

True about function in R

functions in R send their results (their output) to the console, functions have to be defined before they can be used, & the expressions (commands of logic) for a function are typically contained within braces {}

data warehouse

gathered from many different operational databases—that supports business analysis activities and decision-making tasks. The primary purpose: is to combine information, more specifically, strategic information, throughout an organization into a single repository in such a way that the people who need that information can make decisions and undertake business analysis. key idea: universal querying tool • Combines strategic information from many sources into one place o Sources = other operational databases • Standardizes data

Source data

identifies the primary location where data is collected.

hybrid cloud

includes two or more private, public, or community clouds, but each cloud remains separate and is only linked by technology that enables data and application portability. For example, a company might use a private cloud for critical applications that maintain sensitive data and a public cloud for nonsensitive data applications.

Digital Ledger

is a bookkeeping list of assets (money, property, ideas...), identified ownership, and transactions that record the transfer of ownership among participants. All transactions are recorded with a date, time, participant names and other information. It is a linear list to which information can only be added, with older records retained to preserve the full history of each asset.

data artist

is a business analytics specialist who uses visual tools to help people understand complex data. Great data visualizations provide insights into something new about the underlying patterns and relationships.

repository

is a central location in which data is stored and managed.

outlier

is a data value that is numerically distant from most of the other data points in a set of data.

Information cleansing or scrubbing

is a process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information.

subsystem

is a system that functions as a component of another

barrier

is a type of synchronization method -for a group of threads or processes in the source code means any thread/process must stop at this point and cannot proceed until all other threads/processes reach this barrier

Data-driven decision management

is an approach to business governance that values decisions that can be backed up with verifiable data.

Fast data

is the application of big data analytics to smaller data sets in near-real or real-time in order to solve a problem or create business value

Prescriptive Analytics

is the area of business analytics (BA) dedicated to finding the best course of action for a given situation

Data aggregation

is the collection of data from various sources for the purpose of data processing

Virtualization

is the creation of a virtual (rather than actual) version of computing resources, such as an operating system, a server, a storage device, or network resources.

information redundancy

is the duplication of data, or the storage of the same data in multiple places.

Predictive Analytics

is the practice of extracting information from existing data sets in order to determine patterns and predict future outcomes and trends

Infographics (information graphics)

is the process of analyzing data to extract information not offered by the raw data alone.

Anomaly detection

is the process of identifying rare or unexpected items or events in a data set that do not conform to other items in the data set

Analytics

is the science of fact-based decision making.

What is true about algorithms

it is a method for doing something, is a list of steps to follow (a well-defined procedure or formula) in order to solve a problem, & instructions should be unambiguous (clear with no room for subjective interpretation).

Hash

it is an bundle of letters, numbers, and symbols that come together that hides the true identity of the password. -It is usually generated through an algorithm

database

maintains information about various types of objects (inventory), events (transactions), people (employees), and places (warehouses)

Database Management System (DBMS)

manages data (creates, reads, updates, and deletes) in a database while controlling access and security. There are two primary tools available for retrieving information from a DBMS. • First is a query-by-example (QBE) • Second is a structured query language (SQL)

Management Information System

moves information across the company to facilitate decision making and problem solving, by incorporating systems thinking to help companies operate cross-functionally.

software updates (patch) or software upgrades

occur when the software vendor releases updates to software to fix problems or enhance features

Software upgrade

occurs when the software vendor releases a new version of the software, making significant changes to the program

Analysis paralysis

occurs when the user goes into an emotional state of over-analysis (or over-thinking) a situation so that a decision or action is never taken, in effect paralyzing the outcome.

Distributed computing

processes and manages algorithms across many machines in a computing environment. A key component of big data is a distributed computing environment that shares resources ranging from memory to networks to storage.

Public cloud

promotes massive, global, and industry wide applications offered to the general public -customers are never required to provision, manage, upgrade, or replace hardware or software. -Pricing is utility-style; customers pay only for the resources they use.

single user license

restricts the use of the software to one user at a time

community cloud

serves a specific community with common business models, security -in highly regulated industries such as financial services and pharmaceutical companies. -Community clouds are private but spread over a variety of groups within one organization Ex. all Colorado state government organizations

private cloud

serves only one customer or organization and can be located on the customer's premises or off the customer's premises -high data security, and expensive

application service provider license

specialty software paid for on a license basis or per-use basis or usage-based licensing

Relational database

stores information in the form of logically related two-dimensional tables. Advantages: -looks to all of the data to find the data that is needed. -the ability to scale the database to the size of a very large organization -the ability to access, update and share information among many user stations -advanced capabilities for analyzing and reporting

tools

systems and applications that are used to reach an end goal

information cube

term for the representation of multidimensional information. Are a common arrangement of business data suitable for analysis from different perspectives through operations like slicing, dicing, pivoting, and aggregation.

Open systems

the input triggers the process and the process controls the output

data mining

the process of analyzing data to extract information not offered by the raw data alone

Output

the resulting information from the computer program Ex: Hamburger

data element (or a data field)

the smallest or basic unit of information

How to differentiate between data mart and data warehouse

think of data warehouses as having a more organizational focus and data marts as having focused information subsets particular to the needs of a given business unit such as finance or production and operations.

primary key

uniquely identifies each record/row in a table (Panther ID). Used as a lookup ID to be able to search through the data for this individual.

competitive monitoring

where a company keeps tabs of its competitor's activities on the web using software that automatically tracks all competitor website activities such as discounts and new products.

Extraction, transformation, and loading (ETL)

which is a process that extracts information from internal and external databases, transforms the information using a common set of enterprise definitions, and loads the information into a data warehouse. The data warehouse then sends subsets of the information to data marts.

data mining analysis methods

• Data profiling is the process of examining the data available from an existing information source (e.g. a database or a file) and collecting statistics or informative summaries about that data. • Data replication is the process of sharing information to ensure consistency between multiple data sources. • Recommended engine: a data-mining algorithm that generates recommendations from analyzing customers' actions.

Data Mining Techniques

• Estimation Analysis: estimates for continuous variable behavior or estimated future value. • Affinity grouping analysis: reveals the relationship between variables along with the nature and frequency of the relationships. • Cluster analysis: identifies similarities and differences among data sets allowing similar data sets to be clustered together. A technique used to divide information sets into mutually exclusive groups such that the members of each group are as close together as possible to one another and the different groups are as far apart as possible. • Classification Analysis: organizing data into categories or groups for its most effective and efficient use, not to explore data to find interesting segments, but to decide the best way to classify records. Like cluster analysis, it puts data into groups but the difference with classification analysis is that requires that all classes are defined before the analysis begins.


Kaugnay na mga set ng pag-aaral

HESI Diagnostic Exam- Integration

View Set

Anatomy Exam 1 Practice Questions

View Set

Texas Promulgated Contract Forms Ch 1-2

View Set

POS1041 - American Government, Chapter 11 Quiz

View Set

Chapter 38: Agents to Control Blood Glucose Levels

View Set

World History Imperialism Test Study Guide (Adams) 15.1,2,4

View Set