Ideal agent

It may be most natural to define agents in contrast to an idealized version. Ideal agent behavior is relatively easy to define (e.g. attaining maximum expected utility); we also wish to define an ideal agent structure. One could also reasonably talk about agents that are ideal in some respects but not others, or ideal with respect to a given assumption. For example, you could talk about the ideal computable agent, the ideal polynomial-time agent, or the agent that maximizes utility given a simplified world model.

Because of this, there won’t be a single ideal agent. It is not yet clear which versions of ideal agency will be more or less useful for making progress in AI safety.

Because an agent is a collection of parts (value representation, world model, and planning) defining an ideal agent boils down to defining an ideal version of each of these parts.

Value representation

Ideal optimization

Generalizing away from agents for a moment, let’s consider the simplest form of values, a preference ordering. An ideal achievement of a preference ordering would be an ideal optimization process (and not necessarily an agent). Already, we have options for what this could mean, because there are multiple measures of optimization strength. For a given ordering, you could say that a process is ideal if the state goes all the way up the ordering. Or, you could define ideal to mean that it starts all the way down the ordering, and ends up all the way up the ordering. Or, you could call a process ideal if it goes from the bottom to the top of the ordering in the smallest possible time interval. Or, you could call a process optimal with respect to a collection of processes if it optimizes more than any other process in that class (for some selected definition of optimization strength).

Utility maximization

The most common model of an ideal agent is a utility maximizer.

Historically, an early formalization of utility maximization was the von Neumann-Morgenstern theorem. However, this description of utility maximization is purely behavioral (in fact, it’s not even about behavior over time, but about behavior over a collection of independent choices). When Dovetail talks about a utility maximizer, we mean a specific structure. Namely, we mean an algorithm which maximizes utility by construction.

World model

An ideal agent could be defined with respect to a single environment, or with respect to an environment class.

Learning the world model

As discussed on the page for world-model, we’re not particularly interesting in the process by which the agent learns the world model. Consequently, our concept of ideal agent doesn’t need to “ideally learn” the world model. A lot of previous work is about ideal learning; for example, Solomonoff induction, or any ML results about converging to a global optimum. Arguably, Bayesian updating itself is a formalization of optimal learning.

Computability

Although an agent needs to be computable, it still may be useful to compare it against an uncomputable agent. In the same way that there is no single greatest rational number less than pi, there may be no single agent which achieves the greatest expected utility less than the ideal uncomputable agent.

AIXI

A similar model of ideal agency which has been fully formalized is the AIXI model by Marcus Hutter. This model includes the following;

strict agent/environment separation
ideal Bayesian updating over the class of all computable environments
maximizing “reward”, which is modeled as a hard-coded part of the observation stream.

It includes optimal learning of the world model (under suitable computability assumptions).