Human-Compatible AI
Content
- Provably beneficial and well-founded AI, probabilistic
- Safety guarantees, switch-off problem
- Human-aware and human-centered AI: Mental models, interpretable behavior and generation of explanations, agent-supported human collaboration work, aligned AI, analogies and common sense
- Adaptation of language models: Reinforcement learning with
human feedback (PPO method) - Task-oriented perception: from task descriptions to internal goals, task representations
- Basics of assistance games, perception of human
preferences, inverse reinforcement learning - Simulation of agent behavior in mechanisms,
- Durkheim test, Weizenbaum test