Why did I want to read it?

What did I get out of it?

1. Introduction to building AI applications with foundation models

Language models

  • Masked model: trained to predict missing tokens anywhere in a sequence (e.g., for code debugging).
  • Autoregressive: predict next token in a sequence only using preceding tokens (e.g., for text generation). (p. 4)

As simple as it sounds, completion is incredibly powerful. Many tasks, including translation, summarization, coding, and solving math problems, can be framed as completion tasks. For example, given the prompt: “How are you in French is …”, a language model might be able to complete it with: “Comment ça va”, effectively translating from one language to another. (p. 6)

While many people still call Gemini and GPT-4V LLMs, they’re better characterized as Foundational models. The word foundation signifies both the importance of these models in Al applications and the fact that they can be built upon for different needs. (p. 9)

Self-supervision works for multimodal models too. (…) Instead of manually generating labels for each image, they found (image, text) pairs that co-occurred on the internet.

6. RAG and Agents

Anthropic suggested that for Claude models, if “your knowledge base is smaller than 200,000 tokens (about 500 pages of material), you can just include the entire knowledge base in the prompt that you give the model, with no need for RAG or similar methods” (Anthropic, 2024). It’d be amazing if other model developers provide similar guidance for RAG versus long context for their models. (p. 256)

Dense retrievers represent data using dense vectors. A dense vector is a vector where the majority of the values aren’t 0. Embedding-based retrieval is typically considered dense, as embeddings are generally dense vectors. However, there are also sparse embeddings.(p. 258)

TF-IDF (p. 259)

For large datasets, vector search is done with approximate nearest neighbor algorithms (ANN). Vector databases organize vectors in buckets or graphs to make search faster. (p. 262)

Another concern is cost. Generating embeddings costs money. This is especially an issue if your data changes frequently and requires frequent embedding regeneration. Imagine having to generate embeddings for 100 million documents every day! Depending on what vector databases you use, vector storage and vector search queries can be expensive, too. It’s not uncommon to see a company’s vector database spending be one-fifth or even half of their spending on model APIs. (p. 265)

  • Performance considerations

7. Finetuning