Scaling LLM Reinforcement Learning with Prolonged Training Using ProRL v2

Currently, one of the most compelling questions in AI is whether large language models (LLMs) can continue to improve through sustained reinforcement learning…

Currently, one of the most compelling questions in AI is whether large language models (LLMs) can continue to improve through sustained reinforcement learning (RL), or if their capabilities will eventually plateau. Developed by NVIDIA Research, ProRL v2 is the latest evolution of Prolonged Reinforcement Learning (ProRL), specifically designed to test the effects of extended RL training on…

Source