Reinforcement learning needs a trustworthy simulator
Reinforcement learning is only as useful as its environment.
For racing or control tasks, the simulator must encode enough physics, constraints, and penalties for the learned policy to transfer. A reward function that rewards the wrong shortcut will produce the wrong behavior confidently.
The simulator is part of the model, not a neutral playground.
Related: