BUAD 342 Exam 1
volume, variety, value, velocity, veracity
5Vs of big data
blockchain
A digital ledger in which transactions made in bitcoin or another cryptocurrency are recorded chronologically and publicly
Cloud Computing
A general term for the delivery of hosted services over the internet.
DDoS (Distributed Denial of Service)
An attack on a computer or network device in which multiple computers send data and requests to the device in an attempt to overwhelm it so that it cannot perform normal operations.
phishing
An attack that sends an email or displays a Web announcement that falsely claims to be from a legitimate enterprise in an attempt to trick the user into surrendering private information
Central Processing Unit (CPU)
Brain of the computer that performs instructions defined by software (1. fetch instruction 2. decode instruction 3. execute instruction 4. repeat)
worms
Replicates itself in order to spread to other computers
Random Access Memory (RAM)
Temporary memory a computer uses to store information while it is processing. Helps to control the speed and latency of your processing power.
batch processing
Transaction data is collected and stored for processing at a scheduled time or when a specified amount of data has been collected. (ex. payroll, credit card transaction)
real time processing
Transaction data is immediately processed or close after the transaction occurs. (ex. airline or ticket reservation)
big data
a broad term for datasets so large or complex that traditional data processing applications are inadequate.
decision
a choice made from available alternatives (computer choosing yes or no answer)
router
a device that allows your specific devices access to the internet provided by the modem
Internet of Things (IoT)
a giant network of connect things. The idea that objects are becoming connected to the Internet so they can interact with other devices, applications, or services.
Software as a Service (SaaS)
a method for delivering software applications over the internet on demand and typically on a subscription basis
schema
a model of the data
machine learning
a subset of AI that trains machines how to learn
client
any computer hardware or software device that requests access to a service provided by a server
cybersecurity
application of technologies, processes, and controls to reduce the risk of unauthorized exploitation
composite attribute
attribute you likely want to break down into pieces (example: employee address)
derivable attributes
attributes that you can create or compute using other relation attributes
NoSQL
big data databases - use schema on read
procure to pay
buy "things" event
conversion
buy materials and convert to finished goods
unstructured/no structure data
can be text or non text. snap chats, tweets, mail messages, powerpoint presentations, etc.
multivalued attribute
can have more than one value at a time (example: employee skills, education, etc.)
extract
capture data step
central repository
central place where all the data is stored
transform
clean or "scrub" data step
source data automation (SDA)
collecting data at its point of origin in digital form
rootkits, bootkits, spyware, virus, worms
common cyber threats
numeric, text, booleans (only true or false), date, geolocator
common data types
modems
connect your network to the broader internet
master files
contain balance data or the status of an entity at a point in time
transaction files
contain business activity data (i.e., transaction data used to update balances in master files). Examples include order files, billing files, shipping files, and cash receipt files.
suspense files
contain data awaiting some action to complete their processing
history files
contain inactive past or historical data
reference files
contain referential data such as a tax rate schedule, a volume pricing list, or a listing of the general ledger chart of accounts.
tree/hierarchical, network, relational, object oriented
database management systems
ERP (Enterprise Resource Planning)
deals with big data and allows all departments to have access to data
Infrastructure as a Service (IaaS)
delivers hardware networking capabilities, including the use of servers, networking, and storage, over the cloud using a pay-per-use revenue model
Disruptive Technology
displaces an established technology and shakes up the industry or a ground-breaking product that creates a completely new industry
semi structured data
has information associated with it, such as metadata and tags. e.g. JSON (JavaScript Object Notation), data represented as graphs
hybrid cloud
includes two or more private, public, or community clouds, but each cloud remains separate and is only linked by technology that enables data and application portability
relational
interact with one another through lines
data warehouse
is a central repository of information created with data from transactional systems, relational databases, and other sources. used to enable a large number of users to concurrently monior business performance and extract insights useful for decision making
ChatGPT
is a natural language processing tool driven by AI technology that allows you to have human-like conversations and much more with the chatbot. runs on a language model architecture created by OpenAI called the Generative Pre-trained Transformer (GPT).
bot
is a program that operates as an agent for a user or another program or simulates a human activity.
deep learning
is a subfield of machine learning that is concerned with emulating the learning approach that human beings use to gain certain types of knowledge.
1. capturing and recording source data 2. maintaining reference data 3. generating outputs/reports
key information processing steps (3)
load
load data step
operating system
manages all of the software and hardware on your computer (windows, MacOS, IoS)
object oriented
most commonly used today ex. canvas
Deliver goods
most important step in cash to order process (when they are considered sold)
network
multiple paths to get to the data
hierarchical
only one route to get to the data in the model
public clouds
owned and operated by a third-party cloud service provider, which deliver their computing resources like servers and storage over the Internet.
servers
physical or virtual machines that coordinate the computers, programs, and data that are part of the network
receive and inspect goods or services from a vendor
point of purchase in PTP
trojans
poses as trustworthy software
Porter's value chain
recognizes that organizes are elements of a value system (supply chain) (things that businesses do to create and sell products)
Platform as a Service (PaaS)
refers to cloud computing services that supply an on-demand environment for developing, testing, delivering, and managing software applications.
big data repositories
schema-on-read approach
spyware
secretly steals all private information
Cash to Order
sell "things" event
private cloud
serves only one customer or organization and can be located on the customer's premises or off the customer's premises
ransomware
shuts down access to network's essential parts; holds pc hostage and demands money
malware
software that is intended to damage or disable computers and computer systems.
extract-transform-load
steps to construct a data warehouse
networks, devices, applications, and data centers
systems that are especially vulnerable to threats
motherboard
the main circuit board of the computer; the very minimal piece of software that comes with a computer
sequence
the order in which things happen or should happen (steps for computer)
artifical intelligence
the simulation of human intelligence processes by machines, especially computer systems
Robotic Process Automation (RPA)
the use of software with artificial intelligence (AI) and machine learning capabilities to handle high-volume, repeatable tasks that previously required a human to perform
limited standardization if any; ex. fb collecting all of your data
times big data repositories are used
structured data
traditional database which assumes data that will go into tables, so data is structured to fit that model
traditional databases
use the schema-on-write approach
electronic data interchange (EDI)
used so one company can share the information with another company that requires no people interaction
adware
used to display banner advertising and other commercial advertising
data
what is captured
repetition
when a program may need to run through a set of steps more than once depending on the given input
when we know what kind of data we will need before we even begin to collect it (data stored in tables with relationships)
when to use traditional data bases
hard drive
your computer's storage system
AlphaGo
•in 2017, the Go-playing computer program, Google's AlphaGo, defeated the best Go player of the last decade, Lee Sedol •AlphaGo won by resignation after 186 moves. Go is regarded as one of the hardest games for computers to master due to its sheer complexity. There are roughly 200 possible moves for a given turn compared to about 20 in chess, and more possible board configurations than the number of atoms in the universe