World model

In the context of agent foundations, an agent will exist inside a larger system which you could call the world, universe, or environment.

In order to robustly achieve any kind of goal, the agent will need to have a model of the world around it. Specifically, the purpose of the model is to allow the agent to predict the outcome of candidate actions it could take.

In practice, the world models used by AI systems will be made out of a hierarchy of abstractions. A core problem in AI safety is to understand how we can tell which abstractions an AI system is using in its model.

Models as approximations

A model of the world should be an object of the same type as the world. So a natural starting point for saying whether something is a model of something else would be whether they are isomorphic. However, an agent could not typically have a perfect representation of the world in its mind, it is not useful to use isomorphisms as the definition of a model. It is also usually too strict to use homomorphisms as the definition of a model. Instead, we want the typical model to be able to be wrong to some degree. We expect useful definitions of world model to have some kind of error parameter to account for this, which could be a real number \(\epsilon > 0\), or could be a partial ordering over the models, if there are multiple incomparable ways in which they can be wrong.

Modelling time

The world is clearly in some sense not a static thing. That said, it is not obvious how to best represent time in a model. Dynamical systems have the most straightforward means of representing time as an ordered set (typically discrete or continuous as subsets of the integers or reals) that the state can evolve over. In computability theory, time could be considered the number of computation steps performed by a Turing machine (or some other model of computation). In a causal Bayes net, the passage of time is the value of the random variables being propagated along the DAG. In a very abstract setting, time could simply be the difference between the input of a function and its output.

Computability of models

Since a model is (in this context) something that will be inside an agent, it will presumably be a computational process. Therefore computability theory may be a helpful tool for defining models.

Learning the world model

In a typical case, an agent would not start out knowing an accurate model of the world. Instead, it would have something resembling a model class of possible worlds, receive its observations over time, and perform something similar to Bayesian updating.

Throughout the history of AI, the majority of theory has been in about learning: machine learning, statistical learning theory, PAC learning, gradient descent, deep learning, singular learning theory, et cetera. Additionally, the practicality of learning world models is a substantial portion of what determines the capabilities of an agent (the other being planning, or search).

These factors together mean that agent foundations is less often concerned with the learning process than you might expect. We expect that many of our desired results can be obtained under the assumption that the agent has a reasonable world model, either by starting out with one or by learning one by a process that is not essential to specify.

The fact that world models will need to be learned does place significant constraints on what they could be; namely, they need to be learnable using limited data and compute time.