“The authors did just that, in a paper published in December. “We learn an adversarial value function which predicts from experience which situations are most likely to cause failures for the agent.” The agent in this case refers to a reinforcement learning agent. “We then use this learned function for optimisation to focus the evaluation…
