Overview
In many complex tasks, there is no simple mathematical formula for a 'reward.' A reward model is trained to act as a proxy for human judgment.
Training a Reward Model
In the context of LLMs, humans rank several different AI responses from best to worst. This data is used to train the reward model to recognize what a 'good' response looks like.
Role in RLHF
Once trained, the reward model provides the feedback signal that the main AI model uses to improve its behavior via algorithms like PPO.