Order-theoretic optimization

Many of the intuitive properties of optimization can be captured at an extremely general level with order theory. We think this is a good justification to base our definition of optimization in order-theoretic terms.

It is, of course, not expected that any ML engineers, regulators et cetera will need to understand anything about order theory for these ideas to be applied to AI safety. Instead, we hope to use them to prove broadly applicable theorems, which can then be applied to specific AI systems.

Why is this useful?

One reason it can be useful to use this level of generalization is for deconfusion. Often in mathematics research, a generalization is simpler to understand; there are less premises, less moving parts to the definitions. There are less steps to verify, and it can be easier to “see” which things are true.

A second reason that generalization can be useful is because if your theorems are still valid at a higher level of generality, then it means that the conclusions are more likely to apply to reality! When you use a mathematical theorem to reach a conclusion about something in reality, it means that you are relying on all of the premises to be true about that part of reality. If you can generalize the theorem, then you are relying on strictly less modelling assumptions about reality.

Order relations

An order relation is any abstract way to put a set of elements “in order”. For example you can often order physical objects by height, or color, or weight. If the elements are numbers, then the order could simply be the normal sense of “greater than” or “less than”. But if the elements are more complicated objects, like states of the world, then you may need to generalize beyond numerical ordering. Orders are relevant to AI safety because values are a type of order; a given person will consider certain world states better or worse than others, according to their values.¹

The key property of an ordering over a set is that for any two different elements, you can say which one is “greater”. The second key property is that the order is “transitive”; if a is less than b and b is less than c, then a is less than c. (This is obvious for numbers, but is something that has to be proved to be sure that your proposed relation will act like an ordering.)

The state ordering

In order-theoretic optimization, the states of the system will have an order. You could consider this to be a “preference” ordering, that is, a statement about which states are “better” or “worse”. However, be careful not to assume that the system contains an optimizer; it may simply be an optimizing system with no optimizer, and therefore “no one” to say what is better or worse.

The time ordering

The other order relation relevant to optimization is time. Time is usually modeled as a part of the real number like, or as the integers. It’s very common in mathematics for these two cases to be taught as separate but similar (like sums vs. integrals). In the case of optimization, all we really care about is that any two points in time can be compared; that you can say which one is “before” and which one is “after”. This is exactly the structure of an ordering, and so we may as well use this machinery for the time axis as well as the state ordering.

Comparing trajectories

Given a trajectory of states over time, we can easily say whether it is going up the state ordering. However, we cannot as easily say whether one trajectory is optimizing more or less strongly than another trajectory. To do this, we would typically need to bring in more structure than just an ordering, like a probability measure, or even a utility function.

That said, an important part of formalizing optimization is to abstract away from this idea of preferences; a ball rolling down a hill is optimizing, but it is not having an internal experience of feeling good about rolling down the hill. ↩