Mastering the Limits: A New Way to Generalize Offline RL
Offline reinforcement learning (RL) faces a big problem: errors can pile up when deep Q functions try to work outside the data they were trained on. This hurts how well the policy can perform on new stuff. Existing methods are too cautious, which isn't great for generalization. But here's an interes