Human-Compatible AI

Organisation

Content

Provably beneficial and well-founded AI, probabilistic
Safety guarantees, switch-off problem
Human-aware and human-centered AI: Mental models, interpretable behavior and generation of explanations, agent-supported human collaboration work, aligned AI, analogies and common sense
Adaptation of language models: Reinforcement learning with
human feedback (PPO method)
Task-oriented perception: from task descriptions to internal goals, task representations
Basics of assistance games, perception of human
preferences, inverse reinforcement learning
Simulation of agent behavior in mechanisms,
Durkheim test, Weizenbaum test