Catastrophic Forgetting

Training a neural network on new data causes rapid, severe loss of performance on previously learned data. The network doesn’t gradually forget — it catastrophically overwrites old knowledge with new knowledge. Also called “catastrophic interference.”

Intuition

Think of a neural network as a shared whiteboard. When you learn task A, you write on the whiteboard. When you then learn task B, you erase parts of A to make room for B — not because you intended to, but because the same parameters (whiteboard space) must represent both tasks, and gradient descent doesn’t know which parts of A are still important.

The deeper issue is that neural networks store knowledge in distributed representations — information about task A is spread across all weights, not localised to a specific subset. When you update weights to improve on task B, every weight change is a potential corruption of task A knowledge. The more different A and B are, the more the updates conflict.

This is qualitatively different from human forgetting, which is gradual and graceful. A neural network can go from 95% accuracy on task A to 20% after a few batches of task B. The knowledge isn’t gradually fading — it’s being actively overwritten.

Manifestation

Performance on old tasks drops sharply when training on new data — not a gradual decline but a cliff
The drop is proportional to how different the new data distribution is from the old one
Fine-tuning a pretrained model on a small dataset can destroy the general capabilities the model spent millions of examples learning
In RL: replay buffers exist specifically to mitigate this — without replay, the agent forgets how to handle states it hasn’t visited recently

Where It Appears

Q-learning (q-learning/): without a replay buffer, the agent only trains on recent transitions, forgetting Q-values for states it visited earlier → replay buffers are the primary mitigation
NN training (nn-training/): fine-tuning is a controlled form of this problem — learning rate warmup, freezing early layers, and small learning rates are all strategies to limit forgetting
Policy gradient (policy-gradient/): on-policy methods (A2C, PPO) discard data after use, so the policy only reflects recent experience — but the shared value function can still suffer from forgetting
Contrastive learning (contrastive-self-supervising/): fine-tuning CLIP or SimCLR representations on a downstream task can damage the general-purpose features — careful unfreezing schedules help

Solutions at a Glance

Solution	Mechanism	Where documented
Replay buffers	Store and replay old experiences alongside new ones	`atomic-concepts/rl-specific/replay-buffers.md`
EWC (Elastic Weight Consolidation)	Penalise changes to weights that were important for old tasks	(Kirkpatrick et al., 2017)
Frozen early layers	Only fine-tune the last few layers, preserving learned features	(standard fine-tuning practice)
Learning rate warmup / small LR	Limit the magnitude of updates during fine-tuning to reduce overwriting	`atomic-concepts/optimisation-primitives/learning-rate-warmup.md`
Data mixing	Train on a mix of old and new data to maintain performance on both	(standard practice)
Progressive networks	Add new capacity for new tasks instead of reusing old parameters	(Rusu et al., 2016)

Historical Context

McCloskey & Cohen (1989) and Ratcliff (1990) first demonstrated catastrophic interference in connectionist networks, showing that sequential training on different patterns destroyed previously learned associations. The finding challenged the prevailing optimism about neural networks as general learning systems. For decades, it was considered a fundamental limitation. French (1999) wrote an influential review arguing that the problem was inherent to distributed representations. In RL, Lin (1992) introduced experience replay specifically to address forgetting, and the DQN paper (Mnih et al., 2015) showed that replay was essential for stable deep RL — without it, the network catastrophically forgets Q-values for earlier states. The problem has renewed urgency in the era of foundation models, where fine-tuning risks destroying expensive pretrained capabilities.