MED-RL | Hassam U. Sheikh

Published at ICLR 2022.

Ensemble RL suffers from value function collapse — agents converge to similar representations, defeating the purpose of the ensemble. We propose five regularization methods that maximize representation diversity in parameter space, preventing collapse and significantly improving stability.

Stack: Python, PyTorch, deep Q-learning, ensemble methods

References