Concrete problems in AI safety

Posted: 2017-02-01 , Modified: 2017-04-01

Tags: ai safety, machine learning

Focus is on the empirical study of practical safety problems in modern machine learning systems, which we believe is likely to be robustly useful across a broad variety of potential risks, both short-and long-term.

Don’t need to invoke extreme scenarios - which can be speculative and imprecise.

Ex. cleaning robot

Wrong objective

Avoid negative side effects

Approaches

Experiments

Avoid reward hacking

Note:

ML approaches

Experiments

Expensive objective function

Scalable supervision

[Also explanations?]

Approaches

Ex.

More

Experiments

Undesirable behavior

Safe exploration

Approaches

Experiments

Risk of catastrophes being idiosyncratic, overfit to. Need broad set of conceptually distinct pitfalls.

Distributional shift

Approaches

Experiments