Overview

In many complex tasks, there is no simple mathematical formula for a 'reward.' A reward model is trained to act as a proxy for human judgment.

Training a Reward Model

In the context of LLMs, humans rank several different AI responses from best to worst. This data is used to train the reward model to recognize what a 'good' response looks like.

Role in RLHF

Once trained, the reward model provides the feedback signal that the main AI model uses to improve its behavior via algorithms like PPO.

Related Terms