Training Oscillation

Two adversarially coupled components chase each other without converging — the generator adapts to fool the discriminator, the discriminator adapts to catch the generator, and the cycle repeats without either reaching a stable equilibrium. Also called “oscillatory dynamics” or “non-convergence” in game-theoretic settings.

Intuition

Imagine two chess players who can only learn by playing each other. Player A discovers a winning strategy. Player B adapts to counter it. Player A then adapts to counter the counter. Neither player ever settles into a stable strategy — they keep cycling through a sequence of moves and counter-moves. If they’re evenly matched, this cycle continues forever.

GANs are a two-player minimax game, and gradient descent on such games doesn’t have the same convergence guarantees as gradient descent on a single loss function. In a single-objective setting, the loss landscape has a clear “downhill” direction. In a minimax game, there may be no fixed point that both players converge to — the gradient field can rotate around the equilibrium rather than pointing toward it. Each player’s update makes the other player’s current state worse, and the combined dynamics spiral.

The problem is amplified when one player is much stronger than the other. If the discriminator is too powerful, it provides no useful gradient to the generator (gradients vanish or point in erratic directions). If the generator is too strong, the discriminator can’t keep up and the generator overfits to its current weaknesses.

Manifestation

Generator and discriminator losses oscillate in anti-phase — when one improves, the other degrades, and vice versa
Generated sample quality fluctuates — good epochs alternate with bad epochs rather than steadily improving
FID/IS metrics oscillate rather than decreasing monotonically
No clear “convergence” — training can run for millions of steps without reaching a stable state
Learning rate sensitivity: small changes in learning rate ratios between the two networks dramatically change training dynamics

Where It Appears

GANs (gans/): the canonical setting — the G-D minimax game has oscillatory dynamics by nature; WGAN’s Wasserstein objective and hinge loss both help by providing smoother, more informative gradients
Policy gradient (policy-gradient/): not directly adversarial, but the interplay between policy and value function updates can create oscillation, especially when the value function can’t keep up with rapid policy changes
Multi-agent RL: when multiple agents learn simultaneously, each agent’s environment is non-stationary (other agents are changing), creating oscillatory dynamics similar to GANs

Solutions at a Glance

Solution	Mechanism	Where documented
WGAN / Wasserstein distance	Smoother loss landscape with less rotational dynamics	`gans/`
Hinge loss	Saturates the discriminator at a margin, preventing it from becoming too strong	`gans/`, `atomic-concepts/loss-functions/hinge-loss.md`
Spectral normalisation	Constrains discriminator Lipschitz constant, balancing the two players	`atomic-concepts/regularisation/spectral-normalisation.md`
Two-timescale updates	Train discriminator multiple steps per generator step (or use different learning rates)	`gans/`
Gradient penalty	Regularises the discriminator’s gradient, smoothing the loss landscape	`atomic-concepts/regularisation/gradient-penalty.md`
EMA of generator weights	Average generator weights over time to smooth oscillatory weight trajectories	`atomic-concepts/optimisation-primitives/exponential-moving-average.md`

Historical Context

Oscillatory dynamics in adversarial training were observed from the earliest GAN experiments (Goodfellow et al., 2014). The theoretical analysis came from game theory: simultaneous gradient descent on minimax games was known to be non-convergent in general (the dynamics rotate around Nash equilibria rather than converging to them). Mescheder et al. (2018) provided a thorough spectral analysis showing that GAN training dynamics have eigenvalues with large imaginary components, which corresponds to rotational (oscillatory) dynamics. This understanding motivated the shift toward regularised objectives (WGAN-GP, spectral normalisation) that reduce the rotational component. The instability of adversarial training was ultimately one of the major motivations for the field’s migration toward diffusion models, which replace the adversarial game with a stable regression objective.