OIDD 3140 Quiz
Which cloud vendors offer necessary infrastructure for AI model training?
AWS, Google Cloud, Microsoft Azure
What is Amazon offering?
Anthropic.
What is Transparency regarding model and training data?
Being open about the model and data used for training.
What is RAG suitable for?
Companies without high engineering expertise.
What is Google's Bard?
Google's vertically integrated offering for enterprises in Google Compute or Google Cloud.
What is a potential issue with ChatGPT's token completion approach?
Hallucinations can occur.
What are the costs associated with RAG?
High API costs per query.
What are some tools for RAG?
Vector DBs like Pinecone and Weaviate.
What is the parameter difference between BloombergGPT and ChatGPT-3.5?
50 billion vs 475 billion parameters.
What are some examples of closed-source ML models?
ChatGpt4, Bard, Anthropic
What datasets were used to train GPT3?
CommonCrawl, Webtext, online book corpora, and Wikipedia.
What is Prompt Engineering?
Customizing prompts to improve model performance.
How do network effects impact the improvement of models?
Feedback from large user base improves model faster
What is BloombergGPT specialized for?
Financial tasks.
What is ChatGPT?
Fine-tuned model for conversational interaction.
What is the purpose of fine-tuning a model like ChatGpt?
For chat use cases and general cases
What is the disadvantage of fine-tuning?
High upfront fixed cost.
What is the foundation of AI model training?
High-performance GPUs and cloud services
What types of data can Generative AI work with?
Images, audio, and texts.
What is the generative AI pipeline?
It consists of computing infrastructure/hardware, cloud providers, foundation models, fine-tuned models, and software applications.
What is the advantage of fine-tuning?
Low variable cost.
What are the supply-side benefits of economies of scale for LLMs?
Lower cost per query due to resource pooling and demand aggregation
Which company developed the Llama model?
Meta
How many parameters does GPT4 have?
Over 400 billion parameters.
What are some examples of Vector Databases?
Pinecone (closed source) and Weaviate (open source).
What are some concerns related to ChatGPT being trained on historical data?
Plagiarism, consent, compensation, and credit issues.
What should be considered when using AI in consulting?
The division of intellectual property (IP) and using tools like Copilot.
What was the result of the writing experiment with AI?
The group that had AI increased in efficiency and performance.
What are specialized processors needed for?
They are needed to run large models, and the winning architecture for this is GPUs.
How do LLMs work?
They predict the next token (word) given the text so far.
What are the pros of open-source models?
Transparency, control, no data compromise
How can an app built on a foundation model have a competitive advantage?
Unique and proprietary data, distribution, domain-specific opportunities.
What are the limits on the size of prompts for foundation models?
Up to 100,000 tokens.
How does RAG retrieve information?
Using a database of documents and vector representations.
What are GPT4 and ChatGPT?
Versions of Large Language Models (LLMs) developed by OpenAI.
What is the token limit for Anthropic's foundation model?
1,000,000 tokens.
How many parameters does GPT3 have?
175 billion parameters.
What is the token count difference between BloombergGPT and ChatGPT?
700 billion vs 500 billion tokens.
What percentage of the U.S. workforce could have their work tasks affected by LLMs?
80% could have at least 10% of their tasks affected, while 19% may see at least 50% of their tasks impacted.
What can be uploaded to Anthropic's foundation model?
A book with up to 1,000,000 tokens.
What is GPT4?
A developer-facing version of the LLM that provides API access.
What is a Vector Database?
A low-dimensional representation of data.
What is the Torrance Test of Creative Thinking?
A test to measure creative thinking abilities.
What do studies show about AI-human collaboration?
AI + Humans often outperform AI alone and humans alone, especially in creative settings.
What did Fugener et al (2021) find about image recognition by a Human+AI team?
AI can better delegate to humans than vice versa, as humans are not good delegators.
What is Generative AI?
AI that generates plausible content from unstructured data.
What can developers do with GPT4?
Adjust parameters to control the type of output and finetune the model with custom datasets.
What are some of the most exposed professions to LLMs?
Arts & entertainment, legal contracts, and medical images.
What are the benefits of a consolidated market for AI models?
Big developer community, network effects, standard online datasets
How does Generative AI learn?
By understanding the underlying patterns in the training data.
What is the second prompt engineering principle?
Chain of Thought (CoT) prompting, providing examples to help the model reason better.
What are the benefits of open source community?
Collaboration, shared resources, and innovation.
What are tradeoffs?
Considerations between RAG and fine-tuning.
What is the potential outcome of the market for foundational models?
Consolidation around a few major players
How does BloombergGPT outperform general-purpose LLMs?
Curated financial sources in its training dataset.
What is one consideration in designing AI-human collaboration?
Delegation - deciding whether humans should perform tasks and delegate to AI when in doubt or vice versa.
What are some applications of foundation models?
Deriving competitive advantage, legal contracts, unique understanding of problems.
What is the rationale for considering an open-source model like Llama?
Download and have control, data remains in your cloud environment
What are some examples of harmful content generated by ChatGPT?
Encouraging self-harm, racial profiling recommendations, and bomb-making instructions.
What are some examples of software applications in the generative AI pipeline?
Examples include Github Copilot, Sudowrite, Jumpcut Media, and Alltius.
What are some examples of foundation models?
Examples include OpenAI, Google Bard, Anthropic, and Meta's Llama.
What are some tools for customizing Generative AI output/prompt engineering?
Examples include RAG and finetuning.
What is the requirement for large neural networks like GPT-4 and PalM2?
Extensive resources and expertise
What can Generative AI do with the learned patterns?
Generate similar content or solve downstream tasks.
What are the limitations of ChatGPT?
Hallucinations, batch training, biases, lack of transparency, and plagiarism.
What are the costs associated with fine-tuning?
High compute costs for retraining.
Why do most companies rely on cloud providers for infrastructure?
High cost and complexity of setting up independent infrastructure
What is Data Privacy?
Implementing privacy measures within S3 bucket; model not shared.
What are the benefits of Fine Tuning?
Improved performance for specific contexts and tasks.
How can humans add value in AI-human collaboration?
In many settings, humans want to be in the loop and can contribute their expertise.
What are the costs associated with RAG?
Increased API costs based on prompt and output size.
What are the factors contributing to the high costs of developing models?
Infrastructure and need for massive datasets
What is the computing infrastructure/hardware layer?
It includes GPUs used to process ML tasks, such as Nvidia and AMD for chip design and TSMC and Intel for chip manufacturing.
Why is continuous training of LLMs challenging?
It requires a tremendous amount of processing power.
What is a limitation of ChatGPT in providing information?
It struggles to provide sources and may provide made-up citations.
What is an advantage of open-source models like Llama?
Knowing exactly what the model is
How does the performance of LLMs compare to other applications?
LLMs have far greater performance rates than the median application.
What are the cons of using open source models?
Lack of support and steeper learning curve.
What are the potential risks of using open-source models?
Lack of updates, potential security vulnerabilities
What are some capabilities of LLMs?
Language - document summarization, coding - code writing companion, data analysis - automate data analysis.
What are the potential limitations of LLMs?
Limited to the dataset, accuracy, and output variation.
What are some examples of top open-source models?
Llama, Falcon
What is Microsoft offering?
Microsoft is offering open APIs within Microsoft Cloud.
What would happen if Microsoft buys OpenAI?
Microsoft's focus would become narrower, not on artificial general intelligence but on powering enterprise AI needs.
What percentage of BloombergGPT's training dataset consists of curated financial sources?
More than 52%.
What is the advantage of fine-tuning over RAG?
No need to share context for each query.
What was OpenAI's focus before?
OpenAI was focused on foundation models and building the best ones.
What are the risks of proprietary data assets getting compromised?
Other companies can improve their models using your data
What are the cons of open-source models?
Potentially less support, limited features compared to closed-source
What are the two ways of doing Prompt Engineering?
RAG and Fine Tuning.
What is Fine Tuning?
Re-training a pre-trained model on a domain-specific dataset.
What is Retrieval Augmented Generation (RAG)?
Retrieving relevant information and appending it to the prompt.
What is the first prompt engineering principle?
Setting system parameters like temperature and top-p to control randomness and exploration.
What are some relevant levers for a product manager designing an AI-human collaboration system?
The prompt engineering principles and other design considerations.
What is the schism at OpenAI?
There is a divide between research and productizing research.
What was the result of the software development experiment with AI?
There was a 55.8% decrease in time to code a server.
What are cloud providers?
They are companies like AWS, Microsoft, Azure, and Google Cloud that set up and train machine learning models.
What are fine-tuned models?
They are models that have been customized and optimized for specific tasks.
What is an example of using RAG?
Troubleshooting technical issues with a product.
What was the focus of the 2023 study by OpenAI and Wharton on LLMs?
US labor and key tasks, using ChatGPT to assess performance.