Language models (LMs) are a class of probabilistic models that learn patterns in natural language. LMs can be utilized for generative purposes to generate, say, the next event in a story by exploiting their knowledge of these patterns.
In recent years, significant efforts have been put into scaling LMs into Large Language Models (LLMs). The scaling process - training bigger models on more data with greater compute - leads to steady and predictable improvements in their ability to learn these patterns, which can be observed in improvements to quantitative metrics.
In addition to these steady quantitative improvements, the scaling process also leads to interesting qualitative behavior. As LLMs are scaled they hit a series of critical scales at which new abilities are suddenly “unlocked”. LLMs are not directly trained to have these abilities, and they appear in rapid and unpredictable ways as if emerging out of thin air. These emergent abilities include performing arithmetic, answering questions, summarizing passages, and more, which LLMs learn simply by observing natural language.
What is the cause of these emergent abilities, and what do they mean? In this article, we’ll explore the concept of emergence as a whole before exploring it with respect to Large Language Models. We’ll end with some notes about what this means for AI as a whole. Let’s dive in!