Optimization

Optimization is one of the core concepts of agent foundations.

A one-line gloss of what we are trying to point at is: optimization steers the future into a smaller region of the state-space.

It is generally agreed that optimization is a core concept in theoretical AI safety, but it is not agreed upon exactly what it is, either formally or intuitively. Different researchers define it differently, and most of the discussion so far has been in trying to collectively deconfuse the field. Some key readings to understand the agent foundations concept are;

Yudkowsky’s Measuring Optimization Power (2008) and surrounding sequence posts
Flint’s The ground of optimization (2020)
Altair’s draft sequence on optimization (2023)

Relevance to AI risk

AI is a type of optimizing process. An optimization process steers the future, and thus the value of the future depends on the nature of the optimization process. Since we don’t understand the nature of optimization very well in general, we cannot currently do much to ensure that this steering process goes well.

While some optimization is dangerous, not all of it is.

An agent is a special type of optimization process. Agents seem especially dangerous, because, unlike a ball rolling into a valley, agents are “trying” to make the relevant changes.

Terminology

The terminology around optimization has not yet settled down. At Dovetail, we try to use the following terms consistently;

Optimization refers to the general phenomenon
An optimizing system is a system (dynamical or otherwise) whose trajectory is going up a given state ordering
optimizer: if a system’s states have a concept of locality, then we may be able to attribute the system’s optimization to a specific region of the state.
Bit of optimization: if the state space has a probability measure over it, then changes in the state can be measured in bits of surprisal.
Optimization power: optimization per unit time, by analogy with power in physics defined as energy per unit time.
Different measures of optimization strength

An order-theoretic optimizing trajectory has two measures of strength. These apply to all other types of optimization.
How far up the ordering
Duration of timeIf the state space has a probability measure over it, then we can refine these two types.
Bits
PowerIf the system contains an optimizer, then we can define more.
Robustness
Retargetability
Expected utility

Optimization

Relevance to AI risk

Terminology

Different measures of optimization strength