Values

A representation of values is one of the key components of an agent. These values are what the agent is “trying” to achieve.

Value representation

The canonical way to represent values is with a utility function. This is a mathematical function whose input is a world state specification, and whose output is a real number, which represents how “good” the state is, how much the agent “likes” it.

However, a utility function is usually considered to be an idealization of a representation of values. Most things that we could consider agents will not have a genuine utility function; instead they will have some other representation of their values. Understanding the options for this, and the implications of different forms of representations, is one of the open problems in agent foundations.

Preference orderings

Arguably the weakest representation of values is a preference ordering over the state space. While optimization is defined in terms of such orders, it is not necessarily the case that the ordering is represented inside the system (say by an optimizer).

Representing values merely in the form of a preference orderings would not allow an agent to judge the value of its actions under uncertainty. As a simplified example, say that going left keeps the world in the same state, but that going right could lead to two outcomes: one bad, one good. To decide, you would need some way to judge whether the possibility of the bad outcome was “worth” the possibility of the good outcome, compared to the alternative of the neutral outcome from going left. This is usually done by assigning probabilities to the uncertainty, and then a real-numbered value called “utility” to the good, bad, and neutral outcomes, and then computing the expected value of each action.

Domain of the values

In a simplified agent theory, the agent might have a utility function where the domain is a fully specified world state. However, there are a number of serious problems with this view.

Ultimately, the agent’s values must be specified in terms of its world model.¹ Its world model will typically be an enormously reduced representation of the world, and probably a lossy one. As such, a world model state will also be a very compressed representation of a world-state. This leads to several problems for agent foundations to solve;

To the degree that this world model is wrong, it is not clear how to assess the value of the actual world state, and therefore the performance of the agent.
In a typical case, an agent will learn, that is, its world model will change over time. Some changes may render the previous value function meaningless; there may be no sensible way to update the value function to match the new world model. This is sometimes called an ontological crisis.
Similarly, two agents in the same world will typically have different world models. The problem of AI alignment can be described as this, where humans are one of the agents, and where we need to make sure that the AI’s values are sufficiently similar to ours. This problem needs to be solved in the case where the world models differ, since they certainly will.
Another significant problem is that the agent could have values over its internal states. This is not hypothetical; pain, for example, is an internal state that people have preferences over. This means it will need to have a world model that includes its internal states. This violates the simplifying assumptions of the agent/environment setting, and instead requires the setting of embedded agency.

It may be that the agent’s values are specified directly over its observations. This is consistent, because all world models must ultimately be grounded by observations. But it is a fairly impoverished type of value structure. ↩