CIBRC_4A- Systems, Databases, Networks

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

BOOLEAN - cannot be text string like yes or no, has to be ________

true or false

VARCHAR(50) means?

up to 50 character strings long

Features of Network Protocols -_____________ - the sender/receiver announce their presence to one another and ask permission to grab contents of that webpage -_____________- tell me when you receive this, please send me an acknowledge message . -___________- return message that goes with the request -Multiple protocols exist for multiple purposes -Distinguish the Network Protocol from the ___________ Standard Ex: HL7 v2.x message is a pipe-delimited text file, v3 is an XML file, both can be transmitted via TCP/IP across a network (The position in the pipe-delimited file specifies which data attribute you're talking about....whether or not it's a version 2 file (text file) or version 3 XML file, both of those can be transmitted via TCP/IP across a network....so the TCP/IP network protocol can be used to transmit any kind of data in these encoding standards.

"Handshake" Acknowledgement Payload Data or Encoding

-Design of MUMPS predated, anticipated the ____________ movement, where you can on-the-fly add attributes and relations without having to rewrite the entire DB schema

"NoSQL" and "schema-less DB"

ER Diagram - e.g., relationship between customer and purchase order -Customer is an optional participant with _____ symbol -Only one customer can participate (_____ symbol) -A customer can have multiple purchase orders (________ arrow symbol) -Details the _________ of customer and purchaser order

"O" "|" three-pronged attributes

The _______ principles are used to describe reliable, fault-tolerant databases.

ACID (Atomicity, Consistency, Isolation, Durability)

regarding MUMPS: -Multi-user access (original paper recognized potential for conflicting updates, need to have __________ transactions

ACID (atomicity, consistency, isolation, durability)

_________ is an example of a data link address and is a mechanism for translating network layer addresses (an IP address) to link layer addresses (an Ethernet address)

ARP (Address resolution protocol)

Reliable DB Transactions: the _______ Test ____________ -transaction is indivisible, it either happens or it doesn't, no possibility of a partial transaction (ex: a DB transaction that updates 2 cells -it either does both or neither) ____________ -transaction meets all constraint rules (can't add a DATE to an INT field, can't have a non-unique PK) ____________ -RBDMS must be able to sequence simultaneous transactions (ex: 2 transactions to update the same cell. Both must take place, but not at same time, or else you have a write-write failure)....to isolate one read-write event from another read-write event or else you have a collision and a write-write failure ____________ -system must be tolerant to failure (ex: RDBMS has queued 200 transactions in memory, and power fails. How do you know if all 200 transactions took place?). If the power fails, the system must know how many of those were committed and how many were not, whether the full transaction took place or didn't

ACID (e.g., MUMPS is ACID compliant. All database transactions must follow these principles) Atomicity Consistency Isolation Durability

OSI Seven Layer Model Host -Data [transmission of message via HTTP, SMPT, FTP] 7. _______________ 6. ____________ (e.g. character encoding in ASCII or data encryption) 5. ____________ (ex: a web conference may have a persistent session to synch audio and video) (e.g., may need to authenticate to join a web session and that link expires once you disconnect and can't join again without re- authenticating) -Segments 4. ____________ [transmission of segments via TCP, UDP] Media -TCP Packet / UDP Datagram 3. Network [transmission of packets via IP (internet protocol), DNS (Domain Name Server) server, through routers] -Frame 2. Data Link [transmission of frames via Ethernet or PPP] -Bit 1. Physical [transmission of binary bits via copper wire, coaxial, or fiber optic cable]

Application (HTTP is an application layer protocol) Presentation / Syntax Session Transport

___________ Join -Rarely used -gives you the "cross product" of both tables -Often arises by accident when joins don't have correct constraints or if you omit a WHERE clause constraint -Sometimes used to generate _______ data -Uses SQL "_________ JOIN" statement, omits the "ON" statement When this is done get every combination of faculty and interest whether or not they were correct or not, so 5 faculty and 5 interests, 5x5 = 25, so most of the time wouldn't wantt his, but can do this to rapidly generate fake data

Cartesian (Cross) test CROSS

Unified Modeling Language (UML): Standard toolset for describing aspects of databases, software, business processes. A few examples below: -________ diagram to describe OO classes (visual depiction of the following: name, hierarchies, attributes, methods) -________ diagram ~ process flowchart, stepwise description of decisions, consequences, inputs, outputs -__________ diagram -describes actors (participants in an activity), goals, dependencies (btw goals/actor). Has _________ figures in them. - _________ diagram (as in Caboodle) - describes objects and their relationships. Can then be used to define RDBMS logical schema, which DB programmers can use to build physical schema of a DB

Class (see slide 47) Activity Use Case; stick Entity-Relationship (ER)

Object-Relational Mapping (ORM) -Parallelism between OOP (Object oriented programming) and RDBMS is very useful programmatically -Object-oriented Class <--> __________ (if you think of class as DB relation, you can think of an instance of that class as a tuple) -Instance of that class <--> ___________ in a Relation (table) , where each row is a member of the class described by those attributes (so you have the class of all patients, which is the patient table, and you have an instance of that class, which is a specific patient represented as a single row in that relation each row is a member of the class described by those attributes) -Attribute <--> ____________, where the Value of that attribute is the __________ (content of the cell) -Method (accessors, mutators) <--> __________________ functions; i.e., the getter and setter functions (how you retrieve and write data using OOP) are analagous to ______ functions of DBMS Many modern programming languages use ___________ either built-in or available as an extension b/c the parallelism above is so useful Many modern programming languages use Object-Relational Mapping (ORM) either built-i

DB Relation Tuple (a specific row) Attribute (column); element database manipulation (CRUD) CRUD Object-Relational Mapping (ORM)

There is a need for database management, software that Allows users to interact with DB and maintain structure, integrity are called _________

DBMS (Database Management System)

_____________ is a network layer protocol for assigning IP addresses to a host.

DHCP

OSI Seven Layer Model Host -_________ [transmission of message via HTTP, SMPT, FTP] 7. Application (HTTP is an application layer protocol) 6. Presentation / Syntax (e.g. character encoding in ASCII or data encryption) 5. Session (ex: a web conference may have a persistent session to synch audio and video) (e.g., may need to authenticate to join a web session and that link expires once you disconnect and can't join again without re- authenticating) -__________ 4. Transport [transmission of segments via TCP, UDP] Media -______________ 3. Network [transmission of packets via IP (internet protocol), DNS (Domain Name Server) server, through routers] -____________ 2. Data Link [transmission of frames via Ethernet or PPP] -________ 1. Physical [transmission of binary bits via copper wire, coaxial, or fiber optic cable]

Data Segments TCP Packet / UDP Datagram Frame Bit

Data Mining & KD: Terms to Know Three Key steps in Data Mining are Pre-processing, extraction, evaluation&presentation. The following are sub-steps •______________: Removal of noisy and irrelevant data from collection. • Addressing missing values, noisy data, data discrepancy through various data transformations •______________: Combining heterogeneous data from multiple sources into a common source (DataWarehouse). Function of DataWarehouse is where you do all the integration and cleansing. • Migration, synchronization, and ETL (Extract-Load-Transformation, i..e, migration, synchronization of data) process. •______________: Data selection is defined as the process where data relevant to the analysis is decided and retrieved from the data collection. • Can involve statistical methods to identify patterns / outliers, clusters (regression, machine learning) •______________: Data Transformation is defined as the process of transforming data into appropriate form required by mining procedure. Data Transformation is a two-step process (so you might do binning, pre-processing (like taking first and last names and putting them in all caps or lower case, etc): •D

Data Cleaning Data Integration Data Selection Data Transformation

From Google, other sources: Data Transformation Process stages: 1. _______________: During the first stage, data teams work to understand and identify applicable raw data. Data transformation's first step is to identify and realize data in its original or source format. Normally, a data profiling tool is used to carry out this step. 2. _____________: Assigning elements from source base to destination to capture transformations. During this phase, analysts determine how individual fields are modified, matched, filtered, joined, and aggregated. (performed with the help of ETL (Extract Transform Load) data mapping tools. This step is the most time-consuming segment of data transformation as it entails sub-processes such as validation, translation, value derivation, enrichment aggregation, and routing.) 3: _____________ -Creation of the actual transformation program. In this step, code is generated to run the data transformation task. Modern integration platforms are employed to generate the code, thereby simplifying the task for enterprises. 4: ___________: -In this step, code is executed to produce desired output. 5: __________ -Finally, the converted output is verified or reviewed to che

Data Discovery Data mapping Code Generation Code Execution Verification

Data Governance / Stewardship -____________:Framework of processes, tools, methods, and oversight that ensures availability, usability, consistency, integrity, and security of data. Benefits of strong DG practices include improved ability to collect, view, store, exchange, aggregate, analyze, manage, archive, and reuse data. -____________ (person or group): Ensures that data governance processes are followed and enforced. -________Acronym: _____________

Data Governance Data Steward FAIR; Findable (e.g., via metadata), Accessible, Interoperable, Reusable (for Franciscan, Bimal has a Data Governance Tool)

Data Mining & KD: Terms to Know Three Key steps in Data Mining are Pre-processing, extraction, evaluation&presentation. The following are sub-steps •Data Cleaning: Removal of noisy and irrelevant data from collection. • Addressing missing values, noisy data, data discrepancy through various data transformations •Data Integration: Combining heterogeneous data from multiple sources into a common source (DataWarehouse). Function of DataWarehouse is where you do all the integration and cleansing. • Migration, synchronization, and ETL (Extract-Load-Transformation, i..e, migration, synchronization of data) process. •Data Selection: Data selection is defined as the process where data relevant to the analysis is decided and retrieved from the data collection. • Can involve statistical methods to identify patterns / outliers, clusters (regression, machine learning) •Data Transformation: Data Transformation is defined as the process of transforming data into appropriate form required by mining procedure. Data Transformation is a two-step process (so you might do binning, pre-processing (like taking first and last names and putting them in all caps or lower case, etc):

Data Mapping Code generation Pattern Evaluation Knowledge Representation

Many of us are getting more involved in the analytics side and these are important concepts for that: Data Mining & Knowledge Discovery (KD) ____________: automatic summarization, identifying essential information, discovery of patterns in data

Data Mining

Data Mining & KD -Example Task: use EMR data to identify patients with a specific disease phenotype for correlation with genomic data •_________________-combing administrative, clinical, and genomic data •_____________-machine learning to "train" an algorithm to identify suspected cases •______________-NLP (Natural Language Processing (NLP)) tools to extract medication mentions in free-text notes, mapping to canonical terminology •___________________-"Manhattan plot" to highlight which SNPs are associated with candidate disease

Data cleaning & integration Data selection Data transformation Pattern evaluation, Knowledge Representation

Any collection of related data (address book, spreadsheet, MS Access (Microsoft Access Databases back in the day))

Database

Data Warehouse & Data Marts -Extract/Transform/Load (ETL) process gets transactional data into a format that is optimal for reporting / queries -Real life example: Epic EHR runs on Intersystems Caché Object DB for transactional processing. Has a nightly process to push data into an RDBMS (e.g. Oracle SQL) -___________ is a smaller collection of related tables and data derived from the warehouse for a specific purpose, usually for analysis, report generation, spreadsheet, dashboards, etc. Example: EHR may have a real-time transactional DB, nightly dump to a SQL data warehouse, and weekly extracts to a datamart to generate an updated enterprise asthma performance dashboard.

Datamart

_________- Normal Form A DB is said to be in _____ if it can meet the following conditions: •Each cell contains a single value (our "Flat File" example breaks this rule) •Each record is unique (no duplicate rows) • susceptible to certain INSERT, DELETE, and UPDATE anomalies: -We can't use Patient_ID to uniquely identify each row -this table requires a "composite key" -INSERT: A patient can't have a Pharmacy without a prescription, unless we create a row with NULL values -DELETE: If the med "acetaminophen" is deleted, the Pharmacy "Glenbrook" ceases to exist -UPDATE: If the "Medstar" pharmacy chain changes their name, we have to edit multiple cells in this table

First 1NF

_______________: -Convenient, easy, ubiquitous (like excel spreadsheet) -May require redundant data (eg: Louise Chen -have to remember to indicate in each cell that she is deceased) -Can't represent 1-to-many relationships easily (Louise has 2 meds; less useful than having discrete representation of both meds) -Limited ability to enforce data integrity (multiple spellings of "yes" and "no"; e.g., may require redundant data; there's nothing in here to prevent me from multiple spellings of yes and no or data integrity in the med collumn where sometimes all caps, brand names) -Incomplete data represented as blank cells

Flat file

_______________: an attribute whose values must have matching values in the primary key of another table

Foreign key

SQL Joins ________ join will include all rows in both tables, blanks in both

Full outer

__________________ -e.g., tree hierarchy of data, that represents every patient and every branch on that tree is a single patient and if I pluck out that one branch, I can tell you everything about that one patient; the reason why these are very efficient is that computationally can easily traverse a tree (but challenge is can only traverse tree from root (top parent) node -Structurally different from RDBMS -Optimized for rapid transactions of hierarchical data -In very simple terms, makes it easy to know "every ___________ about one thing" (quickly retrieve all known information about patient 1001) -Computationally easy to traverse the tree. Can only traverse tree from root (_________) node Ex: "find all deceased patients who were ordered topiramate" would be "easier" in relational DB than hierarchical DB (b/c you can first find every pt that is deceased and check subset to see who were ordered topiramate; in hierarchical, you would have to look at every order, every patient and every single deceased status and go one by one to find) -Child nodes can only have ___ parent -Difficult to model relationship between child nodes (many-to-many, recursive relationships); recursive relat

Hierarchical Database attribute top parent 1 both

SQL Joins An ________ join statement can also be written as a "WHERE" clause _______ JOINS cannot be written using a WHERE clause If there is no match in ORDER for a specific PAT_ID, that PATIENT will not appear in the resultset. That's what is meant by the key word _________

INNER OUTER "INNER"

Examples of Protocols by OSI Layer Layer ExampleProtocols 7: Application - POP3/IMAP4 for email, HTTPfor web content, FTPfor file transfer, SSHand HTTPSfor secure browsing 6: Presentation - Encryption, decryption, conversion to character sets (like ASCII); also Unicode 5: Session - LDAP "LightweightDirectory Access Protocol" for authenticating users against X.500 directories; authenticate users to access a specific session on a website 4: Transport - TCP "transmission control protocol" (withacknowledgement), UDP"User DatagramProtocol" (without acknowledgement) 3: Network -______________ "dynamic host control protocol" used to assign IP addresses to hosts, for example, when you connect to a wireless hotspot 2: Data Link - ____________ used by TCP to communicate with hosts when only neighboring hosts' addresses are known 1: Physical - none (many standards by which you transmit data via fiber optic or copper wire, etc)

IPv4, IPv6, DHCP ARP -"address resolution protocol"

SQL Joins ________ join only includes rows that match both tables

Inner

History of MUMPS (M-code is from this) -Described in 1969 by Greenes, Pappalardo, Marble, and Barnett -"MGH Utility Multi-Programming System" -Design goals -Flexible interface (e.g. lab systems, notes, variable output format) -Variable length text-handling -Hierarchical design to support complexity of clinical data and update/retrieval methods -Multi-user access (original paper recognized potential for conflicting updates, need to have ACID (atomicity, consistency, isolation, durability) transactions -Large storage capacity -Low CPU usage -A high-level programming language to make interface design less time- consuming, more efficient -MUMPS renamed "M" in 1993 by M Technology Association, recognized by ANSI in 1995 (as an accepted standard) -MUMPS and its derivatives, such as ________________, are among the most widely used transactional DBs for EHRs today, also finance, banking, etc -Design of MUMPS predated, anticipated the "NoSQL" and "schema-less DB" movement, where you can on-the-fly add attributes and relations without having to rewrite the entire DB schema

Intersystems Caché

Examples of Protocols by OSI Layer Layer ExampleProtocols 7: Application - POP3/IMAP4 for email, HTTPfor web content, FTPfor file transfer, SSHand HTTPSfor secure browsing 6: Presentation - Encryption, decryption, conversion to character sets (like ASCII); also Unicode 5: Session - __________ for authenticating users against X.500 directories; authenticate users to access a specific session on a website 4: Transport - TCP "transmission control protocol" (withacknowledgement), UDP"User DatagramProtocol" (without acknowledgement) 3: Network -IPv4, IPv6, DHCP "dynamic host control protocol" used to assign IP addresses to hosts,for example, when you connect to a wireless hotspot; IPv4, IPv6, DHCP by which your router assigns an IP address to a host 2: Data Link - ARP -"address resolution protocol" usedby TCP to communicate with hosts when only neighboring hosts' addresses are known 1: Physical - none (many standards by which you transmit data via fiber optic or copper wire, etc)

LDAP "LightweightDirectory Access Protocol"

Bluetooth Standard -Historically maintained by IEEE as 802.15.1, but now maintained by Bluetooth SIG (special interest group) -Bluetooth 4.0 standard introduced Bluetooth Low Energy (aka Bluetooth Smart or Bluetooth LE) -Bluetooth ______ has recently become very popular in health and fitness -Healthcare-specific profiles for blood pressure, thermometer, glucose monitor, continuous glucometry -Fitness-specific profiles for weight scale, running/cycling speed, heart rate, etc.

LE

SQL Joins ________ join will include all rows in the left table, display blanks from the right. "Left" ALWAYS refers to the table in the ________ statement.

Left outer join FROM

MUMPS Global Variables -"[H]ierachically organized, symbolically accessed" structure -KEY/VALUE database -_________ variables are defined in the scope of the program -_________ variables referenced by an up arrow symbol (later became a caret "^") -This code retrieves a patient in the Active Patient Record (APR) global that matches a local variable "UN" (hospital unit number, or location of patient) and assigns the name and age: SET ^APR(UN, NAME)="DOE, JOHN", ^APR(UN, AGE)="34"

Local Global

Two different types of CODEC.

Lossy - at expense of compressing it, you are losing fidelity, which means you can never regain the full detail of the original source Lossless - allows you to maintain the highest possible fidelity even when you compress it

The prototype in history for hierarchical database and using in healthcare is ________

MUMPS (Mass Gen Hospital Utility Multiprogramming System) in the 1960s

Common features of DBMS -Define data types, structures, constraints (which type of data can go into which types of cells) -Construct data tables, store data on a storage medium -What 4 functions are needed at minimum? _______________ -Share data via permissions, user access control; control concurrency (granular permissions. Some can only read, some can create tables, etc; DBMS can handle concurrency where two people are accessing the same table by either preventing one person from doing it until second one is done etc) -Protect against inappropriate access, hardware/software failure -Maintain & Optimize data structures (such as indexes, de-duplicating results, doing cascading deletes (if you delete a cell in one table that has related cells in other tables, etc)

Manipulate data to create (insert), retrieve (read), update (edit), delete (sometimes abbreviated "CRUD")

_____________: -Used to establish communication between two electronic devices -_______ tags passively store data, some can be written to by an ______ device. -Typical uses = phone-enabled payment (credit card information; tap to pay), PIN storage

NFC = Near Field Communication NFC; NFC

Wireless Applications -IP Telephony ("Vocera" devices in healthcare) -SMS text messaging -Various "secure" texting solutions (HIPAA compliant) - There is ____ secure SMS standard; all of these 'secure' texting solutions are using proprietary methods for data transfer -RFID/NFC tagging of medical devices, patients (Interesting applications For DME - if you put in RFID tag on them and passively detect their location with those tags as they pass certain gateways or through the use of bluetooth beacons....as you pass from one location to another can passively track your motion through the healthcare system)

NO

_______________ in SQL Mimic an "inner join" using ____________ syntax Substitute results of a subquery for "where [column] in" clause in place of a list Example: suppose you want all meds ordered for patients between ages 4 and 5. The data are in two tables, a "patients" table and "medications", with "pat_id" as PK in patients, FK in medications First, you identify all patients between 4 and 5 -this is the subquery Then you pass the results of the subquery as a list of values to the _______ operator select * from medications where pat_id in (select pat_id from patients where pat_age between 4 and 5) The asterisk is a ______ for every attribute

Nested Subqueries nested subquery "IN" wild card

OSI Seven Layer Model Host -Data [transmission of message via HTTP, SMPT, FTP] 7. Application (HTTP is an application layer protocol) 6. Presentation / Syntax (e.g. character encoding in ASCII or data encryption) 5. Session (ex: a web conference may have a persistent session to synch audio and video) (e.g., may need to authenticate to join a web session and that link expires once you disconnect and can't join again without re- authenticating) -Segments 4. Transport [transmission of segments via TCP, UDP] Media -TCP Packet / UDP Datagram 3. __________ [transmission of packets via IP (internet protocol), DNS (Domain Name Server) server, through routers] -Frame 2. __________ [transmission of frames via Ethernet or PPP] -Bit 1. __________ [transmission of binary bits via copper wire, coaxial, or fiber optic cable]

Network Data Link Physical

___________ - -Patterns of links between elements of a computer network -Choice of topology determines fault tolerance, redundancy, and scalability -Phone telephony is an example of a point-to-point connection, so is connection between your CPU and the hard-drive -___________ -Star, Tree -____________-Mesh, Fully Connected

Network Topologies Centralized Decentralized

_____________: -Techniques of structuring tables to reduce redundancy, dependency between tables -Consider the "Flat File" example from earlier -One of the "Medication" cells has 2 entries -This violates a rule known as 1st Normal Form (1NF) Goals of ____________ -To free the collection of relations from undesirable insertion, update, and deletion dependencies -To reduce the need for restructuring the collection of relations as new types of data are introduced, and thus increase the lifespan of application programs -To make the relational model more informative to users -To make the collection of relations neutral to the query statistics, where these statistics are liable to change as time goes by

Normalization Normalization

__________________: This is a 7-tiered description of the various protocols used in internet communications. Two top level ways to think about these, what happens on _____ side and what happens on ________ side

OSI Seven Layer Model Host Media

______________: -Data represented as data objects (OOP, objects can have attributes and methods, that's the same thing here) -Traditionally they Support for more data types (graphics, photo, video, webpages) -Object DBs are usually ___________ into programming language, so accessing data doesn't require complex driver configuration -Increased use recently with development of web applications, most web application frameworks support interaction with OODBMS (object oriented DB mgmt system) -There's a blurred line between hierarchical and object oriented DB -Commercial example: ____________ behind the Epic EHR

Object Databases integrated Intersystems Caché - the OODBMS

Data Storage Strategies: Terms to Know Location: -______________ -you own and operate the data center -______________ -you lease a data center and manage the servers, but not the facility -______________ -outsource the data center and server management, but you manage the database itself Storage Medium: -Hard drive (cheap, but slower and prone to failure w/ moving parts) -often put into ____________ with various techniques for redundancy such as mirroring, parity checks -__________ drives (expensive, smaller, faster) -Tape or other medium Tradeoff of durability vs redundancy cost vs speed -Ex: AWS (Amazon Web Service) "S3 Glacier Deep Archive" vs. "S3 Standard" -Both highly-available, but Glacier is designed for very infrequent access -slow (minutes/hours), very inexpensive. S3, in contrast,has latency of milliseconds and can be used for real-time production applications

On premises "on prem" Co-location Cloud Redundant Array of Inexpensive Disk (RAID) Solid state

Examples of Protocols by OSI Layer Layer ExampleProtocols 7:Application - _________ for email, _______ for web content, ______ for file transfer, ______________ for secure browsing 6: Presentation - Encryption,decryption, conversion to character sets (like ASCII); also Unicode 5: Session - LDAP"LightweightDirectory Access Protocol" for authenticating users against X.500 directories; authenticate users to access a specific session on a website 4: Transport - TCP "transmission control protocol" (withacknowledgement), UDP"User DatagramProtocol" (without acknowledgement) 3: Network -IPv4, IPv6, DHCP "dynamic host control protocol" used to assign IP addresses to hosts,for example, when you connect to a wireless hotspot; IPv4, IPv6, DHCP by which your router assigns an IP address to a host 2: Data Link - ARP -"address resolution protocol" usedby TCP to communicate with hosts when only neighboring hosts' addresses are known 1: Physical - none (many standards by which you transmit data via fiber optic or copper wire, etc)

POP3/IMAP4; HTTP; FTP; SSH and HTTPS

Medium Range Wireless Standards Medium Range WLAN -Wireless Local Area Network -802.11b (the OG of these wireless protocols) -Max 11Mbps, interferes with other 2.4Ghz devices like microwaves (so when you turned on your microwave, your wireless device would drop connectivity), Bluetooth, cordless phones. ___________ -802.11g-Max 54 Mbps, same band as 802.11b, same interference concern. -802.11n -uses both 2.4Ghz and 5 Ghzspectrum for max speeds of 54 Mbps and 600 Mbps respectively. Speed enhanced by MIMO (Multiple Input, Multiple Output) -802.11ac -standard for "gigabit wifi" -1 Gbps

Popularized WiFi

Three key steps in Data Mining 1. ____________ - (take messy operational and transactional data and you) clean, integrate, select, transform 2 .___________ 3. ______________ back to the user

Pre-processing Extraction Evaluation & Presentation

_________________: an attribute that uniquely identifies a tuple (row)

Primary key

Short Range Wireless Standards Short Range PAN -Personal Area Network -__________ (one way) (e.g., tags you use when you microchip pets) and ______ (two way; e.g., contactless payment (google pay, apple pay)) -IEEE 802.15 -Wireless Personal Area Network and derivatives (_________ & Infrared Data Association or IrDA)

RFID; NFC Bluetooth

_______________: -Defines association within and between relations (relation ~ table; each table is its own relation) -Each attribute (attribute ~ column) corresponds to a domain in the relation -Each tuple (tuple ~ row) describes an ordered list of elements, the order is important -Data elements (element ~ cell) have a data type that is consistent across that attribute. (VARCHAR, INT, DATE, LONG, etc) -Attributes can also have constraints (non NULL, auto-incrementing (e.g., every time you add a patient to the table, it adds 1 to that index, so every patient has a distinct and sequential number assigned to them), cascading delete, Primary Key, Foreign Key) beyond the type constraint -Create and describe structure/constraints using "Data Definition Language" (DDL) which contains metadata (data about the data) -Further describe the data using a Data Dictionary (not just PK/FK, constraints, but also definitions of each field and its intended use)

Relational Database (one step up from flat file)

SQL Joins _________ join will include all rows in the right table, display blanks from the left The one in the _______ statement is ALWAYS the LEFT

Right outer FROM

Types of Network Topologies

Ring Mesh Star Fully Connected Line Tree Bus

Telecommunications *Telephony has changed rapidly in past decade *Popularization of mobile, wifi, and VOIP *Video conferencing -Web conferencing / collaboration via H.264 ___________ -This is the the same codec*, known as MPEG-4, used for distribution of video content on IP, like YouTube CODEC = coder / decoder -a compression algorithm used for a digital stream to transmit audio, video, etc. They can be "lossy" or "lossless". Lower bitrate often means lower fidelity.

Scalable Video Coding (SVC)

_________ Normal Form A DB is said to be in ______ if it can meet the following conditions: • The table must be in 1NF and... • The table must have a single-column, non-composite, primary key • susceptible to certain INSERT, DELETE, and UPDATE anomalies: -INSERT: We can't indicate a patient's pharmacy unless there is a medication prescribed -DELETE: If you delete the last row that contains med "9906761", you no longer know if it's on formulary -UPDATE: to update the formulary status for a Medication_ID may require updating multiple rows

Second 2NF

Bluetooth Low Energy (BLE) aka "Bluetooth __________" -Low battery consumption, limited need for data transfer -One-way communication in close proximity -BLE Beacons broadcast packets of data at regular intervals, and devices (like Smartphones) pick them up, detected by pre-installed apps or services. -Uses: indoor navigation, proximity-based marketing

Smart

_______________ -a family of related languages with different dialects specific to the RDBMS (MS-SQL, Oracle SQL, MySQL) (MySQL is one of the open source RDBMS) -English-like with key words that allow for all functions common to DBMS ("_______" functions): INSERT INTO [table] VALUES ________ SELECT[columns] FROM[table] WHERE ________ UPDATE[table] SET[column] = [value] WHERE __________ DELETE FROM [table] WHERE[column] = [value]

Structured Query Language "SQL" CRUD [tuple] [constraints] [condition]

________ is a connection-oriented protocol for bidirectional communication, and it includes acknowledgement.

TCP

Examples of Protocols by OSI Layer Layer ExampleProtocols 7: Application - POP3/IMAP4 for email, HTTPfor web content, FTPfor file transfer, SSHand HTTPSfor secure browsing 6: Presentation - Encryption, decryption, conversion to character sets (like ASCII); also Unicode 5: Session - LDAP "LightweightDirectory Access Protocol" for authenticating users against X.500 directories; authenticate users to access a specific session on a website 4: Transport - ____________ 3: Network -IPv4, IPv6, DHCP "dynamic host control protocol" used to assign IP addresses to hosts,for example, when you connect to a wireless hotspot; IPv4, IPv6, DHCP by which your router assigns an IP address to a host 2: Data Link - ARP -"address resolution protocol" usedby TCP to communicate with hosts when only neighboring hosts' addresses are known 1: Physical - none (many standards by which you transmit data via fiber optic or copper wire, etc)

TCP "transmission control protocol" (withacknowledgement), UDP"User DatagramProtocol" (without acknowledgement)

The TCP/IP Stack- the backbone of the internet -Network protocol used for internet communications -TCP= transmission control protocol -IP = internet protocol -UDP (user datagram protocol) is an alternative to TCP -Key differences -______ requires acknowledgement, ______ does not -_____ guarantees sequence/order of packets, ______ does not -______ used where packet loss is unacceptable (will resend until acknowledgement or timeout) -______ used where packet loss is less important (Voice over IP aka "VoIP" or streaming protocols)

TCP ; UDP TCP; UDP TCP UDP

The __________ - the backbone of the internet -Network protocol used for internet communications -TCP= transmission control protocol -IP = internet protocol -UDP (user datagram protocol) is an alternative to TCP -Key differences -TCP requires acknowledgement, UDP does not -TCP guarantees sequence/order of packets, UDP does not -TCP used where packet loss is unacceptable (will resend until acknowledgement or timeout) -UDP used where packet loss is less important (Voice over IP aka "VoIP" or streaming protocols)

TCP/IP Stack

HTTP requests at the Application layer are transmitted via _______. In contrast _______ is a connectionless protocol and does not require acknowledgement.

TCP; UDP

________ Normal Form A DB is said to be in _______ if it can meet the following conditions: -Table meets all criteria for 1NF AND 2NF AND -Table must have ____ "transitive functional dependencies", meaning -changing the value of one cell should not require a change to another row. (In the 2NF example, note that changing the value of a medication ID could require a change to the "Formulary" attribute) • susceptible to certain INSERT, DELETE, and UPDATE anomalies! • Example: What if there was a registration error and patient 1001 and patient 1003 are actually the same patient? How can you avoid changing multiple cells in the first table?

Third 3NF NO

Normalization & Denormalization -The "Normal Forms" were described by Codd and Boyce, who described techniques to reduce inconsistencies and dependencies in relational databases. These forms are named numerically 1NF, 2NF, 3NF, BCNF, 4NF, 5NF, and 6NF. -For a practical tutorial on database normalization, see here: https://www.guru99.com/database-normalization.html -In practice, a database that is in __________ can be called "normalized" -Normalized DBs are safe against most INSERT, UPDATE, and DELETE anomalies, however, to generate a report, you have to "__________" the data -requires lots of PK & FK "JOIN" logic in your query -For high-performance RDBMS apps, ___________ schema may be preferable to allow single-table lookup functions with an index, to avoid additional JOINs and full-table scans (e.g., if you are commonly looking up pt meds, you might intentionally have a table that's ____________ as it lists both the patients and the medications in one table, rather than have a separate patient table and a separate meds table, which you would expect from a very normalized schema) -Also, you need to ___________ data to aggregate the data into meaningful groups or reports (eg: all meds f

Third Normal Form (3NF) (There are higher forms of normalization beyond 3NF, like "Boyce-Codd Normal Form" (abbreviated "BCNF")) denormalize denormalized denormalized denormalize denormalized

What tools do developers use when they are coming up with requirements for their database or program. They tend to fall in the category of _______ tools

UML (Unified Modeling Language)

_____________: Standard toolset for describing aspects of databases, software, business processes. A few examples below: -Class diagram to describe OO classes (visual depiction of the following: name, hierarchies, attributes, methods) -Activity diagram ~ process flowchart, stepwise description of decisions, consequences, inputs, outputs -Use Case diagram -describes actors (participants in an activity), goals, dependencies (btw goals/actor). Has stick figures in them. - Entity-Relationship (ER) diagram (as in Caboodle) - describes objects and their relationships. Can then be used to define RDBMS logical schema, which DB programmers can use to build physical schema of a DB

Unified Modeling Language (UML)

Long Range Wireless Standards Alphabet Soup & Confusing Marketing Alert! WiMax CDMA 3G 4G / 4G LTE (30-50Mbps) 5G --> emerging standard promising hundreds of Mbps over cellular. Could replace _______ (e.g. you place a 5G receiver near the window of your house, connect that to a residential router); could replace a residential router that uses fiber optic or coaxial cable; it promises speeds almost as fast as wifi even if you are not near a broadband-enabled site

WiFi

MUMPS Global Variables is accessable to _______ part of the application

any

____________ refers to the fact taht a database transaction is an atomic event - it must either completely take place or not take place.

atomicity (Consider an "UPDATE" statement that is meant to modify the values of 15,000 rows in a database. if the RDBMS suffers a hardware failure during the update, the system must be able to guarantee that either all 15,00 rows were updates or none were updated; a partial transaction would lead to potential data inconsistencies); so in this sense, the transaction is "atomic" and indivisible

UML E-R Diagram: Based on the E-R Diagram, a developer can: •describe the logical schema for the database •create physical schema and DDL (Data Definition Language used to describe structure/constraints which also contain metadata)/SQL code to create tables •create object classes that map to database tables •map object classes to DB tables using an ORM (object relational mapping) tool DDL - describes the portion of SQL that ______, ______, ______ database objects

creates, alters, deletes

Regarding ACID: Consistency -transaction meets all ___________ (can't add a DATE to an INT field, can't have a non-unique PK)

constraint rules

Relational Database (one step up from flat file) -Defines association within and between relations (relation ~ table; each table is its own relation) -Each attribute (attribute ~ column) corresponds to a domain in the relation -Each tuple (tuple ~ row) describes an ordered list of elements, the order is important -Data elements (element ~ cell) have a data type that is consistent across that attribute. (VARCHAR, INT, DATE, LONG, etc) -Attributes can also have __________ (non NULL, auto-incrementing (e.g., every time you add a patient to the table, it adds 1 to that index, so every patient has a distinct and sequential number assigned to them), cascading delete, Primary Key, Foreign Key) beyond the type constraint -Create and describe structure/constraints using ___________ which contains _________ (data about the data) -Further describe the data using a ___________ (not just PK/FK, constraints, but also definitions of each field and its intended use)

constraints "Data Definition Language" (DDL) metadata Data Dictionary

Regarding ACID: Durability -system must be _________ to failure (ex: RDBMS has queued 200 transactions in memory, and power fails. How do you know if all 200 transactions took place?). If the power fails, the system must know how many of those were committed and how many were not, whether the full transaction took place or didn't

tolerant

Regarding ACID: Atomicity -transaction is indivisible, it either ____________, no possibility of a partial transaction (ex: a DB transaction that updates 2 cells -it either does both or neither)

happens or it doesn't (either commit a change to the DB or don't)

Regarding ACID: Isolation - RBDMS must be able to sequence simultaneous transactions (ex: 2 transactions to update the same cell. Both must take place, but ____________, or else you have a write-write failure)....to isolate one read-write event from another read-write event or else you have a collision and a write-write failure

not at same time

RFID RFID = Radio Frequency Identification 3 "flavors": ______________ -Passive relies on power from the reader, but reader has to emit 1000x stronger signal to elicit the response from the RFID itself -Tags are read-only or read-write -RFID reader sends a signal to interrogate tag -RFID tag/chip responds with ID and other info -Like tags, readers can be active or passive -Uses: animal tags, "Smart cards," asset tracking

passive, active, battery-assisted passive

Relational Database (one step up from flat file) -Defines association within and between ______________ -Each ___________ corresponds to a domain in the relation -Each __________ describes an ordered list of elements, the order is important -Data ____________ have a data type that is consistent across that attribute. (VARCHAR, INT, DATE, LONG, etc) -Attributes can also have constraints (non NULL, auto-incrementing (e.g., every time you add a patient to the table, it adds 1 to that index, so every patient has a distinct and sequential number assigned to them), cascading delete, Primary Key, Foreign Key) beyond the type constraint -Create and describe structure/constraints using "Data Definition Language" (DDL) which contains metadata (data about the data) -Further describe the data using a Data Dictionary (not just PK/FK, constraints, but also definitions of each field and its intended use)

relations (relation ~ table; each table is its own relation) attribute (attribute ~ column) tuple (tuple ~ row) elements (element ~ cell)

Relational Database -The relation ____________ is a description of the relation, its attributes, and the data types / rules associated with the relation. -A specific table that uses that schema is an _________ of that schema -Adding new relations as easy as (is analagous to) adding a new table, add an attribute by adding a column (adding an attribute.....eg.., adding every patient's twitter handle to their medical record identifier, you can do that by adding a column to that table) -In very simple terms these make it easy to know "everything that has one attribute" Ex: "find all patients born in 1974"

schema (table schema) instance

_________ quotes around text strings in SQL

single

SQL Wildcards: "%" matches any length, "_" must match a ________ character where lastNamelike 'Smith_' ...would match "Smithe", "Smiths", "Smithy" LIKE is ____ sensitive, so you may need to case-correct the string before matching UPPER([char]) --> converts [char] to all upper-case LOWER([char]) --> converts [char] to all lower-case This expression: where lower(lastName) like 'desa%' ...would match "Desai", "DeSai", "desai", "DeSalles", etc...

single case

Healthcare Applicability & Challenges -Retrofitting older facilities with equipment is expensive -Bandwidth limitations as amount of data increases -MRI wrist = 5MB -CT 3D reconstruction skull = 120MB -CT angiogram = 230MB -Human genome = 850MB (I've seen stats as high as 1.5GB) -Challenges with compression, image quality, and transmission -Keeping up with demand -more and more "ologies" -Network security -distinct wireless networks for telephony, hospital applications, guest applications. VPN and Remote access (e.g., hacking a ventilator; so best practice to keep wireless networks distinct, keep patient guess network distinct from that of patient care, keep VOIP networks separate from your other networks for telecommunications; having separate VPN and remote access into hospital's network is critical) -Both RFID and NFC pose "__________" concerns (skimming: if all it takes to read a credit card number is a passive reader, could you disclose important information by just walking past one of those things) -"Bring Your Own Device" -everyone has a personal device they'd like to use at work (to mitigate risk of unmonitored device is to use strategies like MDM (______

skimming mobile device management


Kaugnay na mga set ng pag-aaral

CNA 101 - Modules 14-15 Network Application Communications Exam

View Set

PREPU: Chapter 27: Management of Patients W/Coronary Vascular Disorders

View Set

IC3 Lesson 16: Understanding Email, Contacts and Calendaring

View Set

Unit 1: Foundations of Nursing Practice

View Set

American Studies-Boxer Rebellion

View Set