Foundation models are characterized by their scale (billions of parameters), the breadth of their training data (diverse internet-scale text, code, and increasingly images and audio), and their emergent capabilities—behaviors that appear at scale and were not explicitly trained for, such as in-context learning and multi-step reasoning.
- • Scale: billions of parameters trained on trillions of tokens.
- • Generality: capable of many tasks with minimal task-specific data.
- • Emergence: capabilities arise at scale that were not present in smaller models.