Exam 1 - Generative AI and Chapter 1-3
What is the purpose of Data Reduction?
The purpose of Data Reduction is to reduce the size of a data set to a more manageable and suitable size for business analysis.
What are hallucinations?
Hallucinations are words, phrases that the software has or will make up.
** know the chart from the textbook that describes the purpose of visualizations, and the common types of visualizations used to meet each purpose **
** know the chart from the textbook that describes the purpose of visualizations, and the common types of visualizations used to meet each purpose **
What does LLM stand for?
Large-Language Model
What is "artificial general intelligence"?
AI that can do any intellectual task that a human can
What are the TWO general things to evaluate to determine if augmentation, automation of a task is worth pursuing?
Can AI do it? How valuable is it for the business?
What are the TWO types, subtypes of data?
Categorical: Nominal and Ordinal Numerical: Interval and Ratio
** know the difference between random sampling, stratified random sampling, cluster sampling, and convenience, non-probability sampling **
* know the difference between random sampling, stratified random sampling, cluster sampling, and convenience, non-probability sampling **
Tableau Prep
- a basic but powerful tool for preparing a data analysis
Data Dictionary
- a centralized repository of information about data containing a separate record for each field or variable in a database
String
- a collection of one or more characters that are stored as categorical data
OLAP
- a computing method that enables users to easily and selectively extract and query data for analysis from a different point of view
R or Python
- a programming language that can be used to clean data and conduct a business analysis
Tableau
- a spreadsheet software for advanced analysis of data
Excel
- a spreadsheet software for basic analysis of data
T-Test
- a statistical test that is used to determine if there is a significant difference between the mean of a group or set of data (X2)
ANOVA
- a statistical test that is used to determine if there is a significant difference between the mean of a group or set of data (X3)
Microsoft Power Query
- a tool that's built-into Excel and Power BI that will allow them to connect to a variety of different data sources
Histogram
- a visual representation of a frequency distribution
Box Plot
- a visual representation of data that is disbursed by a quartile
Alteryx
- an advanced, powerful tool for preparing a data analysis
Measure
- an attribute that is characterized as numerical
Hadoop
- an open-source framework for storing and processing Big Data
Dimension
- any attribute that is characterized as categorical
Geographic
- any data that can be linked to a map
Data Warehouse
- can integrate different "database" across a company
Data Lake
- integrates data from different sources
What is correlation?
- the measure of the relationship between a variable and a variable by measuring how they change with respect to each other
Microsoft Power BI
- uses basic and advanced data analytic model(s) and visualization(s)
What is the common file format to deliver structured TABULAR data?
.csv
What is the common file format to deliver structured TEXT data?
.txt
What are the THREE general tips for creating a prompt for Generative AI?
1. Be detailed, specific 2. GUIDE 3. Iterate ** EXPERIMENT **
What are the FOUR general steps to prepare data for analysis?
1. ENSURE data QUALITY 2. VALIDATE the data for COMPLETENESS and INTEGRITY 3. CLEANSE the data 4. PERFORM preliminary "exploratory analysis"
What are FOUR ways to improve a model's performance?
1. Prompting 2. RAG 3. Fine-tune Model 4. Pretrain Model
What is a Z-Score?
A Z-Score is there to tell you how many Standard Deviation(s) a data point is from the Mean.
What does AI automate?
AI doesn't automate a job, but it does automate a task.
What is an example of augmentation of a task with Generative AI?
Augmentation is meant to help a human with a task. EX: You can recommend a response for a customer-service agent to edit, approve.
What is Big Data? What are the FOUR V's that describe Big Data?
Big Data: - data that is too large and complex for a business's centralized system to capture, store, manage, and analyze VOLUME - VARIETY - VELOCITY - VERACITY
What are some characteristics that data should have?
Data should be RELEVANT and RELIABLE.
What are the different types of analytics?
Descriptive, Diagnostic, Predictive, Prescriptive, and Adaptive or Autonomous
What is the difference between descriptive, inferential statistics?
Descriptive: - a measure that will describe a group of interest Inferential: - a measure that is calculated using only a sample of the desired population
What are some tips for responsible AI?
Fairness, Transparency, Privacy, Security, and Ethicalness
ETL
EXTRACT - TRANSFORM - LOAD
Which sector(s) is, are expected to have the most impact knowledge workers?
Educator, Legal Professional, etc...
What does ERP mean? What is it?
Enterprise Resource Planning ERP is a type of business management software that integrates applications from throughout the business into one system.
What are exploratory and explanatory data visualizations?
Exploratory: - a graphical representation that is useful for uncovering patterns and useful insights in the data Explanatory: - a graphical representation useful in communicating the findings of the analysis to stakeholders
What is Generative AI? Who are the major players in the market at this time?
Generative AI is an AI system that can produce high-quality content: text, images, and audio. EX: Chat GPT (Open AI), Bard (Google), Bing Chat (Microsoft)
What are the FOUR AI tools mentioned in the course?
Generative AI, Supervised Learning, Unsupervised Learning, and Reinforcement Learning
What is a TOKEN, and what is it used for related to Generative AI?
It is a method of payment.
What does it mean that Generative AI is a "general purpose technology"?
It is a technology that comes around "once in a generation", and it affects just about every human.
What kind of job(s) will have more impact from Generative AI?
It will impact "higher-paid" jobs, more.
What are some common measures of central tendency?
Mean, Median, and Mode
Can Generative AI filter out bias automatically?
No
Does Generative AI work well with structured data?
No
What are the types of bias that may be an issue when working with data?
Nonresponse, Selection, Confirmation, and Outlier
What is a Parameter vs. a Statistic when doing analyses on data?
Parameter: - a characteristic of a population Statistic: - a characteristic of a sample
What is a Primary and Foreign Key in a relational database?
Primary Key: - any key that will function as a unique identifier in a table Foreign Key: - is a key that will create a relationship between a table and another table
The instructions given to an LLM to perform a task is called, what?
Prompt
What are some common measures of dispersion?
Range, Variance, and Standard Deviation
What does RLHF stand for? What is it used for?
Reinforcement Learning from Human Feedback It is used to train the system. The system is being trained to produce an answer that is of the preference of the user.
What does RAG stand for and what does it do?
Retrieval Augmented Generation It gives LLMs access to external data sources.
What are the FOUR components to the SOARs analytics model?
S: SPECIFY THE QUESTION O: OBTAIN THE DATA A: ANALYZE THE DATA R: REPORT THE RESULTS
What are some functional business areas where businesses spend money on Generative AI, and they receive a significant impact?
Sales, Marketing, Software Engineering, Product R&D, etc...
What is a general time-frame for developing and running a supervised learning AI technique with experienced personnel?
Scope Project - Build, Improve System - Internal Evaluation - Deploy, Monitor
What are some common roles that people play in building Generative AI software? What does each one do?
Software Engineer: - responsible for writing the application Machine-Learning Engineer: - responsible for implementing the system Product Manager: - responsible for identifying, scoping the project
What are the THREE basic components of a relational database?
TABLE - FIELD - RECORD
Can Generative AI create programming code?
Yes
Does ChatGPT capture all publicly available information on the Internet at some point in time?
Yes
Is Generative AI a replacement for doing a web-search?
Yes
Is it a true statement that LLMs are used as a reasoning engine to process information rather than just as a source of information?
Yes
Are there limits to how much data can be included in the input to or output from a LLM?
Yes. Usually, a few-thousand words, phrases.
How can you determine the current knowledge cutoff point?
You can review the day, time of the information that the technology is referencing.