Our research agenda

Dovetail’s mission is to help humanity safely navigate the creation of powerful AI systems.

Our broad direction for doing this is to help deconfuse researchers about AI and help form the paradigm for agent foundations. Below is a high-level description of how we think this type of research can help. To gain a more concrete sense of how this type of research might help, see the page on how theory can help AI safety.

Our research bets

Formalizing x-risk arguments

We and many others believe that the development of powerful AI systems poses an urgent existential risk to humanity. But at present, everyone who believes this does so based on conceptual arguments. Thus far, these arguments don’t provide very precise guidance on how to safely move forward (other than to not move forward at all). Moreover, while these arguments are gaining significant traction, they are far from widely accepted, including by machine learning engineers.

We believe that many aspects of these arguments can be formalized as mathematical theorems. (This is similar to, but separate from, the idea of “provably safe” AI or formally verifying software. Dovetail is not currently pursuing either of those types of research.) If so, this could dramatically increase our ability to safely develop AI technology, while at the same time increasing coordination between parties and legitimizing regulations.

Finding safe AI architectures

The space of all possible AI systems contains many safe AIs and many dangerous AIs. Much of the debate and uncertainty lies in exactly what the relative balance of these is, and how different selection processes (like different types of reinforcement learning) will shift the probability of landing on a safe or unsafe system.

One of our bets is that the safety dimension of this space can be largely understood through general architectural properties of the system, like how strong of an optimizer it is, whether it is a “domain-limited” optimizer, or how close its architecture is to that of an ideal agent. Another core bet is that these properties can be efficiently understood using mathematical tools (rather than, say empirical exploration of existing ML models). A third part of the bet is that these mathematical conclusions can be practically applied to the AI systems society builds.

Our research problems

These pages describe some of the problems that make up our research agenda.

Defining optimization
Understanding domain limited optimization
Defining agents
The agent structure problem
Understanding the convergence of abstractions, possibly via algorithmic abstractions
Formalizing the fragility of values
How convergent is utility maximization/agent structure?

Because agent foundations is pre-paradigmatic, these problems do not come with well-defined success criteria. The problem is to come up with definitions that become generally agreed to capture the relevant properties of AI systems, and to find methods that can be successfully applied to solving the relevant problems.

Extending existing work

Because agent foundations is interdisciplinary, there are many existing results from other fields that could be interpreted as results about agents, or which could be extended to results about agents. Some of these results are things we are actively working with, but most of them are effectively a to-do or wish-list. If you’re interested in working on extending any of these results, let us know!

The Touchette-Lloyd theorem. This gives a simplified information-theoretic bound on how much the change in a system’s state can be attributed to the action of an observing agent.
The Internal Model Principle. This gives us conditions under which a successful agent must contain a dynamic model of environmental perturbations.
Factored space models
Work by the Causal Incentives Working Group. This group tries to understand agency in the setting of causal Bayesian networks.