Skip to content

Catastrophic Forgetting

Training a neural network on new data causes rapid, severe loss of performance on previously learned data. The network doesn’t gradually forget — it catastrophically overwrites old knowledge with new knowledge. Also called “catastrophic interference.”

Think of a neural network as a shared whiteboard. When you learn task A, you write on the whiteboard. When you then learn task B, you erase parts of A to make room for B — not because you intended to, but because the same parameters (whiteboard space) must represent both tasks, and gradient descent doesn’t know which parts of A are still important.

The deeper issue is that neural networks store knowledge in distributed representations — information about task A is spread across all weights, not localised to a specific subset. When you update weights to improve on task B, every weight change is a potential corruption of task A knowledge. The more different A and B are, the more the updates conflict.

This is qualitatively different from human forgetting, which is gradual and graceful. A neural network can go from 95% accuracy on task A to 20% after a few batches of task B. The knowledge isn’t gradually fading — it’s being actively overwritten.

  • Performance on old tasks drops sharply when training on new data — not a gradual decline but a cliff
  • The drop is proportional to how different the new data distribution is from the old one
  • Fine-tuning a pretrained model on a small dataset can destroy the general capabilities the model spent millions of examples learning
  • In RL: replay buffers exist specifically to mitigate this — without replay, the agent forgets how to handle states it hasn’t visited recently
  • Q-learning (q-learning/): without a replay buffer, the agent only trains on recent transitions, forgetting Q-values for states it visited earlier → replay buffers are the primary mitigation
  • NN training (nn-training/): fine-tuning is a controlled form of this problem — learning rate warmup, freezing early layers, and small learning rates are all strategies to limit forgetting
  • Policy gradient (policy-gradient/): on-policy methods (A2C, PPO) discard data after use, so the policy only reflects recent experience — but the shared value function can still suffer from forgetting
  • Contrastive learning (contrastive-self-supervising/): fine-tuning CLIP or SimCLR representations on a downstream task can damage the general-purpose features — careful unfreezing schedules help
SolutionMechanismWhere documented
Replay buffersStore and replay old experiences alongside new onesatomic-concepts/rl-specific/replay-buffers.md
EWC (Elastic Weight Consolidation)Penalise changes to weights that were important for old tasks(Kirkpatrick et al., 2017)
Frozen early layersOnly fine-tune the last few layers, preserving learned features(standard fine-tuning practice)
Learning rate warmup / small LRLimit the magnitude of updates during fine-tuning to reduce overwritingatomic-concepts/optimisation-primitives/learning-rate-warmup.md
Data mixingTrain on a mix of old and new data to maintain performance on both(standard practice)
Progressive networksAdd new capacity for new tasks instead of reusing old parameters(Rusu et al., 2016)

McCloskey & Cohen (1989) and Ratcliff (1990) first demonstrated catastrophic interference in connectionist networks, showing that sequential training on different patterns destroyed previously learned associations. The finding challenged the prevailing optimism about neural networks as general learning systems. For decades, it was considered a fundamental limitation. French (1999) wrote an influential review arguing that the problem was inherent to distributed representations. In RL, Lin (1992) introduced experience replay specifically to address forgetting, and the DQN paper (Mnih et al., 2015) showed that replay was essential for stable deep RL — without it, the network catastrophically forgets Q-values for earlier states. The problem has renewed urgency in the era of foundation models, where fine-tuning risks destroying expensive pretrained capabilities.