AI Alignment | Tech Glossary | LavX News | LavX News

Overview

As AI becomes more powerful, the risk of 'misalignment' increases—where an AI pursues a goal in a way that is harmful or unintended by its creators.

Key Challenges

Outer Alignment: Defining the right goals and reward functions.
Inner Alignment: Ensuring the AI doesn't develop its own unintended sub-goals during training.
Scalable Oversight: How to supervise AI systems that are smarter than humans.

Techniques

RLHF is currently the most common practical tool for alignment in LLMs.

Related Terms