ISTM 210 Test Three

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Decision Support Systems (DSS)

examples are Excel and Access, some are more user friendly than others, Tableau is a DSS that is more graphic and therefore more user friendly. emerged in the 1950's, and are computer-based systems that support an organization's decision-making activities. For example, loan officers at a bank use DSS to verify the credit of a loan applicant. A simple spreadsheet is DSS that calculates possible revenues and expenses which can help someone decide whether a start-up business is viable or not

business analysis

field of business analytics (BANA old name, still what many do BA)

Table

file in a database that is related to other tables

Shuffle

puts things together that are related (like transform), ex: categorizing words with different sentiment categories (positive, negative, neutral) together in groups

Decision/prescriptive analytics

recommending courses of action/showing likely outcomes of each. form of business analytics and builds on Predictive Analysis to make decisions about future industries and marketplaces. Decision Analytics looks at an organization's internal data and then analyzes external conditions like supply abundance and then endorse a best course of action

Structured Data

resides in fixed formats. This data is typically well labeled and often with traditional fields and records of common data tables. Structured data doesn't necessarily have to be "table-like," but needs to at least have recognizable patterns that allow it to be more easily queried, searched, and in a standard format

Descriptive analytics

reviews past events, analyze and report what happened, more of a REACTIVE strategy. form of business analytics and is the baseline that other types of analytics are built. Descriptive Analytics define past data you already have that can be grouped into significant pieces like a department's sales results, and also start to reveal trends

Data validation

rules that help ensure data integrity. Data integrity techniques attempt to avoid data input errors like typing mistakes

Entity

something you want to keep data about, ex: every student/building/faculty/class at A&M is an entity, represented by a rectangle

Mark Benioff

Salesforce.com

Database schema

a "map" of data tables and their relationships to one another

Data Warehouse

a collection of data from a variety of sources used to support decision making and generate business intelligence, is basically a database plus a whole lot more. ex: Oracle, IBM, SAS, Teradata. can slice and dice (go across products, years/months, and geographic location) or data mine (digging through data mines to give recommendations, like Netflix does). where an organization stores and consolidates disparate data in a central location, could be yottabytes of data

Field

a column in a database, represents a category. these categories, called attributes, (student ID, birthdate, etc) are stored in the columns, represented by a circle

Entity Relationship Diagram (ERD)

a database-modeling method used to construct a theoretical and conceptual representation of data to produce a schema. Simply stated, an ERD is a picture of a database's tables and how they relate to each other. if min is 0, could be either optional (don't have to have a company car) or timing (haven't bought/used it yet, ex: contact lenses for cats, books in Evans library, snowmobiles in downtown Bryan)

Record

a row in a database, holds the data corresponding to each field

Querying

a tool that lets you ask your data questions that in turn lead to answers, and eventually decisions

Database Management Systems (DBMS)

a well thought-out collection of computer files, the most important of which are called tables (which are related). Businesses decide what's important to keep track of and what to put into the database. Tables are where a database holds data. Tables consist of records (rows) separated by fields/attributes (columns) that can be queried (questioned) to produce subsets of information. The records retrieved by these queries become information that can be used to make business decisions. Database systems are extremely similar to filing cabinets in that they organize information based on a set of rules. Database designers need to know a business, and they decide who sees what - different levels of security can be implemented by determining who gets to see what in a business environment. Designers find out what's unique to each customer (first, last name) and includes them in fields/columns under customer information. They begin by finding out what a business does and how they do it. Database software is different than excel or word, requiring the end user to create their own files starting with tables and then relating them to each other. Once the table structure is complete, the user is required to understand possible valid entries in a table's field (column). DBMS systems have the ability to store information securely by giving certain end-users access to data and other end-users limited access to the same data. Different levels of security can be implemented by determining who gets to see what in a business environment

Primary key

an attribute that uniquely identifies a row in a table, ex: CRN/UIN

Hadoop

an open source technology (nobody owns it, you can use it in your own setting) that allows us to handle big data, the structure allows companies to catalog larger amounts/wider varieties of data, including that of unstructured data such as videos, social media posts, etc. an infrastructure for storing and processing large sets of data across multiple servers. Instead of centralized files (your oceans) in one place like a Data Warehouses or Datamarts, Hadoop uses a cluster system that allows files to be stored on multiple servers. Unlike storing oceans of water (centralized data), Hadoop attempts to identify lakes and rivers (data files on other multiple servers). it is flexible enough that it allows for one query to be issued that searches through multiple servers, but is not the best for on the fly, or ad hoc queries, and is very difficult to implement and run

Business analytics

attempts to make connections between data so organization's can try to predict future trends that may give them a competitive advantage. Business analytics can also uncover computer system inadequacies within an organization

Normalized relational database

-data is consistent -redundancy is minimized and controlled -attributes appear multiple times only when the function as foreign keys (links other tables together) -the referential integrity rule ensures there will be no update anomaly problem with foreign keys limits insert, delete, and update anomalies avoids data redundancy. The nature and structure of a well thought-out database management system helps a business avoid data redundancy

Zach scenario

2 questions on test about it

Terms of Service (TOS)

CSP's, like Amazon, require their Cloud customers to agree with their " Terms of Service" (TOS) before they are willing to let you port your data and information to their Cloud. Does this mean that you are giving all rights and access to everything you load onto the Amazon Cloud to outside observers to sift through? The answer is clearly "yes."

Apache OpenOffice

Deciding to go with SaaS as an organization can be a scary process as it means leaving traditional business tools like Microsoft Office which is a very robust and powerful business software suite. Can MS Office really be replaced when some consider it an essential business tool? Apache OpenOffice is an open-source business software suite that has a word processor like Microsoft Word, a spreadsheet like Excel, presentation software like PowerPoint, and a database like Access. It used to be called OpenOffice.org and was administered by Sun Microsystems, then ownership went to Oracle, and it is now in the hands of Apache where it is known by its current name, "OpenOffice." All OpenOffice applications are robust business tools and are absolutely free.

Data redundancy

For instance, only a singular table holds personal information about a person. A person's unique information, like name and address information, resides in just one place in the entire DBMS. This can be both an advantage and disadvantage in a database. Reports and forms that use a person's name from a singular table will always display that name the same way, even if it is wrong. That means that if a DBMS consists of 23 forms and 124 reports, that include a person's name, the name will always display the same way. If the name is wrong, it only has to be corrected in one singular place; on the table that holds the "name" data and all forms and reports will display the name correctly. Avoiding data redundancy ensures accuracy system wide throughout a DBMS, and that can reduce reconciliation errors immensely

Who benefits from big data?

big data just gets bigger and more useful. some businesses have been collecting data for a while and only now know how to make sense of it. retail - CRM, optimal store locations/layouts financial services - banks, credit cards, mortgage institutions can run risk analysis, fraud protection advertising - target ads to certain customers based off of data collected about demographics government - security, counter-terrorism, FBI hires accountants to follow money trail energy - smart grid, sensors healthcare - risk factors, genome research

Social Media Platforms

can uncover what your customers are thinking

Extract

collect data, like tweets, phone conversations, social media comments, etc. once you've determined where your data resides, you can start extracting it, often from Customer Relationship Management (CRM) or Enterprise Resource Planning (ERP) software. One of ERPs main function's is to centralize an organization's data so that it ends up being a wealth of data with value across the organization. The extraction step sometimes grabs unstructured data like text notes to semi-structured or structured data by tagging it with metadata. For instance; tagging with metadata could mean extracting country names from an unstructured sales database and loading them into a column form and labeling it "Country Names"

Unstructured Data (80% of all data)

disorganized data that cannot be easily read or processed by a computer because it is not stored in rows and columns like traditional data tables. Imagine collecting massive amounts of Facebook messages and Instagram posts to determine future fashion trends. Organizations attempt and do collect videos, documents and all manner of fragmented data to eventually make sense of it. A whopping 80% of all data is unstructured

Report

display database information on screen or on paper

Foreign key

each one to many relationship needs one to link them together, links tables together; an attribute in one table that is a primary key in another

Dashboards

easy-to-use graphical interfaces that characterize specific data analysis through visualization. Not everyone wants to deal with BI programming using Structured Query Language (SQL) or other programming tools. Dashboards make it a lot easier to make sense of the data and see the resulting information

Map

gets data ready to be analyzed (like extract), ex: sentiment analysis from unstructured data (5 stars)

Jen Mazzo

google employee/mom

Marketing Automation Services system

has endless customer facts and figures

Update anomaly

having to update records/fields in multiple places

Tableau

helps people see and understand data, helped Texas Rangers save money

Business Intelligence

historical, current, and predictive data to help people make decisions in business. there will be 1.7 Mb per second per person by 2020. refers to an assortment of software applications used to analyze an organization's raw data. BI can be described as computer applications that change data into significant, meaningful information that helps organizations make better decisions. Keep in mind that data is raw, unorganized facts, and information is essentially processed data that means something. ex: an auto parts store with "just right" inventory would be far more profitable and flexible, making BI a good return on investment (ROI)

File

holds all of the fields and records. also called a table, since it's a table full of records, looks similar to a spreadsheet

Disadvantages of cloud drives

if the internet goes down, the Cloud is not available. since a Cloud Drive is "somewhere" on the Internet, it begs the question: who else has access to my Cloud Drive and computer files? Privacy groups have raised concerns about the Cloud because your Cloud Service Provider (CSP) can access your data and information stored on their Cloud. Could the CSP unintentionally or even purposely alter or even delete your data and information? the answer is clearly "yes". a CSP asks permission to use all of your data stored on the Cloud, which means they look at it. What if an organization is without Internet for a day? Does this mean their cloud is unavailable? The answer is "yes," which is another disadvantage of the cloud. Another disadvantage is that although your Cloud can be accessed from virtually any Internet enabled computer, the application software needed to run a specific file, like a spreadsheet or word processing document, needs to be present. Also, can a small business really leverage Cloud tools where its employees work in a virtual environment and may never even see, or even meet each other face-to-face? Many would argue that this model is not sustainable because people need to be managed at a more personal level to have a more coordinated business effort as opposed to twenty-five independent people scattered in separate locations. Large businesses that lease office space may not want to leverage the Cloud because they essentially pay for a workspace for their employee(s) five days a week. If they let employees work from home two of those days, it doesn't mean they enjoy cost savings on office space, as they still have to pay for all five days.

Customer Relationship Management (CRM) system

is an important component of an ERP (enterprise resource planning, running of a business), used to track and organize communications with customers. ex: Siebel system. contains information about sales, marketing, customer service records, and much more

Semi-structured data

lands somewhere in-between Structured and Unstructured data and can possibly be converted into structured data, but not without a lot of work

crosstab query

like a pivot table

Load

loading the data into data warehouses (cubes?). once data is transformed and normalized, it's ready to be finally transferred into the data warehouse or datamart. Loading sometimes happens weekly, daily, or even hourly. The more often this is done, the more up-to-date analytic reports are possible, and the more timely they can be

Data Analysis

makes sense of an organization's collected data and turns it into useful information to validate future decisions by applying statistics and logic techniques to define, illustrate, and evaluate data

Predictive analytics

more of a PROACTIVE strategy, like if current sales trends continue, how much sales stuff do we need for next year? form of business analytics and attempts to reveal future patterns in a marketplace, essentially trying to predict the future by looking for data correlations between one thing, and any other things that pertain to it

Chief Analytics Officer (CAO)

new position developing, deals with things that CIO (chief info officer) doesn't

Infrastructure as a Service (IaaS)

one of the main categories of Cloud Computing. hosts supplies hardware, software applications, servers, and storage via the Internet. If the organizations using Iaas need more computing, IaaS can quickly adjust and supply more computing power because it is considered on-demand. The IaaS customer can use a pay-as-you-go model (per hour, per week, per month, etc.) that eliminates the cost of having to buy new computers. If an organization depends on IaaS and the Internet goes down, its employees cannot get work done. If the Infrastructure as a Service goes down, the organization using it could potentially suffer because it is essentially without computing power, and that could result in a reduction of output. ex: Windows Azure or Google Compute Engine Phinney's notes: outsources equipment used to support data servers, storage devices, and networking components. pay per use or monthly (could be useful during a temporary project), handles networking servers, equipment, connectors, etc. example of large provider: Amazon web services (AWS). Conde Naste: is a publishing company that publishes Vogue, New Yorker, wired, and others, lets AWS set up their infrastructure. Careful - 1 in 3 system switch overs (local control to cloud base) encounters a major challenge such as: 1. complex pricing (figuring out how much it costs) 2. hidden costs 3. performance issues, poor user support, and greater than expected downtimes. there were always issues going from manual to computer, as there are issues now going from computer to the Cloud. planning and transparency are so important, communication

Water metaphor

overall: illustrates the massive amounts of time, money, bandwidth and storage space required for ETL. It would be prudent to start calculating the Return on Investment (ROI) to find out whether it's worth it Extract: building water pipes and infrastructure to lakes and rivers and take it back to your holding tanks Transform: purifying and preparing water to be added to your ocean Load: pouring the purified water into your ocean Data Warehouse: oceans Analyze: will your company be able to navigate the very oceans it filled to predict and validate decisions?

Form

overlay data tables and queries for more specific views of data (looks exactly like the REO Fact Sheet)

Reduce

performs calculations to get closer to an answer (like load), ex: % of social media posts that are positive, negative, or neutral

Insert anomaly

problem that occurs when trying to add incomplete records

Software as a Service (SaaS)

sometimes called " On-Demand" software is when the CSP installs and operates application software in the Cloud that the Cloud user can use. For instance, if the application software needed to use a spreadsheet is already installed in your Cloud, you will be able to use it on any Internet enabled computer, even if the particular application software is not loaded locally on that computer. This eliminates the need to install and run application software on the Cloud user's own computer, that could potentially save businesses enormous amounts of money. ex: Google Apps, Apache OpenOffice, QuickBooks Online, SalesForce.com, and Microsoft Office 365 Phinney's notes: you subscribe to Web-delivered application software and pay a monthly service charge or per-use fee. ex of providers: Oracle, SAP, net suite, Salesforce, etc. Saas can call for dummy terminals, where all of the software is running on the Cloud and computers are only there to connect to the software and harness the software's abilities from the Cloud. Container Store example: there are more than 70 container stores and more than 2 million dollars of sales per year. PAYROLL AND BENEFITS put ON ultipro HCM TAMU system went to workday for Saas, or Data as a service "Daas", way of saying we'll show you our data, need internet to take advantage of it Tableau (Texas Rangers) is software that helps you visualize data, one of people that founded Pixar helped make it Air bnb: uses Amazon web services to store data/databases/processing Google chromebook: is internet browser only, it doesn't store data, not software

Data mining

sometimes called Data Discovery is the examination of huge sets of data to find patterns and connections, and identify outliers

Text analytics

sometimes called Text-mining hunts through unstructured text data to look for useful patterns, like whether their customers on Facebook.com or Instagram.com are unsatisfied with the organization's products or service

Storage as a Service (StaaS)

storing or renting space from a CSP. Because of economies of scale, CSPs like DropBox can provide storage much more cost effectively than a single business. A business can certainly buy storage devices for excess storage needs, but Cloud storage is infinitely less expensive. One criticism of StaaS is that it requires the business using it to increase its bandwidth required to access the Cloud, and that is an added cost Phinney's notes: is a data storage service rents space to people and organizations that you use it via the internet. it is pay per usage. ex of providers: Amazon Elastic Compute Cloud, Apple iCloud, Dropbox, Google Drive, Sky Drive, Rackspace (San Antonio), Fibertown (Downtown Bryan, A&M uses it, pays for the service) etc. it is scaling, meaning if you need more, you get more and pay more. Data is on REDUNDANT servers across multiple data centers to provide data REDUNDANCY and protect against ACCIDENTAL loss of data or NATURAL DISASTERS/ACTS OF TERROR. Pinterest example: Pinterest launched in 2010 with more than 50 million pictures, using Amazon's simple storage service (S3), logged 14 Tb of data a day, pays Amazon to keep track of data for them

Relational database

tables that are connected to other tables with related information. an important feature is that data about various things of interest (entities) are stored in separate tables. Makes it easier to add new data to the system Redundancy is minimized and controlled

The Cloud

the Cloud is simply using computer resources like a hard drive or software that exists on another computer connected by a network, typically the Internet. a lot of dummy terminals use the Cloud, which are computers whose sole purpose is to open data stored in the Cloud or use the Cloud's resources only when they need it. Servers are computers that store/send out data. One disadvantage to regular internal hard drives is that they have to be backed up on a regular basis. To prudently back up a hard drive, it makes sense to back-up to an external hard drive. The problem with backing-up to an external hard drive is that its contents are also lost if it's not taken off site. With the cloud, if the original internal hard drive contents of a computer are corrupted or lost, like in a fire, the internal hard drive's contents are still intact at a different location; an off-site hard drive. That's the Cloud, or at least a small part of it. If your resume file (MyResume.docx) was on a Cloud Drive, it wouldn't require back up, as the Cloud Service Providers (CSP) like Google, Microsoft, or DropBox would provide back up automatically. This also means that you don't particularly need any specific microcomputer to get work done, just one that can connect to the Internet. Cloud Drives can be purchased and administered just like a hard-wired network, with the same types of permissions. All twenty-five employees will be able to access the same Cloud "network" as they would the hard-wired network; therefore, the Cloud replaces the need and cost for the hard-wired network. Since the Cloud is accessible from virtually anywhere, it also replaces the need and cost for office space. Employees could simply communicate through texts, smart phones, emails, and Skype, and by implementing a virtual office space, extend their global reach you have to have internet, like using computer resources on another computer through the internet. The Cloud is responsible for backing things up and having a recovery plan; it's on Cloud services to make sure nothing gets lost. A&M keeps some of their data on the Cloud and some on an internal hard drive (keeps it local, nobody can get to it). If the internet goes down, the Cloud would not be available. Used to be called iDrive. An advantage is you don't have to worry about backups

Data visualization

the graphic display of the results of data mining, analytics and BI in general, typically in real time. Data Visualization software helps BI program results become more understandable and therefore, more meaningful in decision making

Structured Query Language

the most widely used standard computer language for traditional relational databases as it allows a programmer to manipulate and query data. the most common use of SQL is to query a table(s). For instance, a table might have the names of 10,000 book titles and their prices. If a person wanted to know which books in the data table cost more than $100, he could set up a SQL statement like the following: SELECT Title (code) FROM Book (table) WHERE Price > 100 (value) ORDER BY Title (code) The SQL above "Selects" the "Title" field (column) "From" the table named "Book" "Where" the book's "Price" is greater than $100. Once the query is run, the results of the query will be in "Order" of the "Title" field of the book(s). New books can be added and deleted from the data table in the above example using other SQL statements such as APPEND, DELETE and UPDATE (multiply everything by 1.05) queries. Many SQLs statements include more than one table in their query as well as multiple criterion

Map Reduce

the processing arm, or engine of Hadoop. It allows data to be queried and processed directly on the server where it lives, instead of moving the data across the network to be analyzed on the computer. Map Reduce is like little computer minions that search out and query data where it resides, and process the query instead of dragging it back to a large centralized server, not unlike what Google does when it sends out Web bots to find new information for its search engine. Because Map Reduce isn't bringing vast amounts of data back to centralized servers like Data Warehouse and Datamart techniques, it saves immense amounts of network bandwidth (water flowing through pipes) and resources (processing cycles and water holding tanks)

Solution to anomalies

the solution is to use tables in relational databases (store data in related tables) so that each entity has its own table that is related to other tables with the rest of their information

Extract, Transform, and Load (ETL)

tools that are used to standardize data across systems, allowing it to be queried. It is important to note that ETL must happen in order; first extracting the data, then transforming data so it fits into your data warehouse or datamart, and then loading the data into the data warehouse or datamart

Transform

transforming data into data that can go into warehouses. once you've extracted data, it needs to become normalized. Data is no good to you unless it's organized. Normalizing data means that your data is typically organized into the fields and records of a relational database. Normalizing provides the standard data format required to analyze data

Topic analytics

tries to catalog phrases of an organization's customer feedback into relevant topics. For example, if a customer said, "the barista was friendly", that would be categorized under the topic "Employee Friendliness"

Composite key

two or more attributes joined together to form a primary key, made of two primary keys from two different tables that you linked, ex: 0001-0002. if a student is taking 5 classes, then their ID is apart of 5 different composite keys if you linked the students and classes table together. if a class has 350 students, the class code will be in 350 different composite keys.

Datamart

used often by single departments or single functions within an organization, is a smaller, more focused data warehouse. Datamarts limit the complexity of databases, so you can't "answer" as much as with a Data Warehouse, but they are cheaper to implement than a full warehouse. Datamarts use data from smaller parts of an organization, like the marketing or purchasing department

Saved query file

used to find specific populations (subsets) within databases

Four V's of Big Data

used to make sure data is of good quality Volume - sheer quantity, Walmart, # of tweets sent around the world in one day. refers to the amount of data collected by an organization. How much data does your business need, and further, where do you keep it once you've collected it? Velocity - the speed to gather/process information, ads in Heathrow geared for individuals, recognize who you are and what ad should be displayed for you based off of demographics research. how fast can you collect data, and more importantly, how quickly can you analyze it? Variety - is it Structured (business transactions, orders, shipments, recording the selling of things), Semi-structured (not structured but there are some tags, XML or HTML tags), or Unstructured (tweets, customer complaints, calls, posts on Facebook)? Veracity - is data we collected any good? clean or scrubbed? is the data your organization collected any good? lots of data sources have data that is not "clean," meaning it may be too fragmented to be valuable or usable, or that it was simply collected poorly in the first place

Platform as a Service (PaaS)

when a Cloud provider supplies a computer platform (like Windows) through the Internet. Also provided in PaaS are programming applications, database, and web servers. One advantage of PaaS is that large business organizations can decide to contract Cloud services to replace many things they already do, like administering their own platforms, their own programming environments, and their own database. That could mean that the business would save costs and complexity because it no longer needs as many information technology people who traditionally do these jobs. It also means that the business can buy as much or as little PaaS as it needs. That means it is never short on essential computing services, and it does not have to worry that it has too much. When a large organization grows, it can scale its PaaS larger, and vice versa. However, what happens if a business wants to be more flexible and use new programming languages that its PaaS doesn't provide? What happens if the PaaS database does not fit the organization's needs? What happens if the business decided to sign a long-term contract with a Cloud PaaS provider? These questions point out some disadvantages with PaaS; that a business organization can be locked in to a PaaS that may not be flexible enough and allow the company to grow. It also points out that, had it kept its information technology staff, the business would still have had the ability to do what it considers necessary to grow ex: Amazon's Elastic Beanstalk, Force.com, and Google's App Engine Phinney's notes: provides you a platform to develop and launch (apps), as well as an operating system, programming language execution environment, database services (DBMS) and web-services. examples: Microsoft Azure, Google App Engine (pay), Amazon Web Services Elastic Beanstalk (cloud). users can create apps on the Paas and then deploy them, encourages entrepreneurship

linking table

when an attribute in one table is apart of a composite primary key in another, you need one if there is a many to many relationship. the number/short text/date option has to be the same for two tables to be linked

Delete anomaly

when you delete a record, you delete all information about that entity

Referential integrity

won't let you refer to an entity that doesn't exist, ex: will only let you sign up for a class if you have a valid UIN


Kaugnay na mga set ng pag-aaral

Ch.45 Mgmnt of pts w/ oral esophageal disorders

View Set

Chapter 18 - Caring for Clients with Cancer, Chapter 18: Caring for Clients with Cancer Prep -U

View Set

Cervical Spine Anatomy and Pathology

View Set

Abeka grade 8 science reading quiz H

View Set

Manufacturing Processes Chapter 3

View Set