Our Writings

Distilling the Internal Model Principle
Jose Faustino

This is the first part of a two-post series about the Internal Model Principle, which could be considered a selection theorem, and how it might relate to AI Safety, particularly to Agent Foundations research. In this first post, we will construct a simplified version of IMP that is easier to explain compared to the more general version and focus on the key ideas, building intuitions about the theorem's assumptions.

Daniel C et al.

I am working on a project about ontology identification. I've found conversations to be a good way to discover inferential gaps when explaining ideas, so I'm experimenting with using dialogues as the main way of publishing progress during the fellowship. We can frame ontology identification as a robust bottleneck for a wide variety of problems in agent foundations & AI alignment...

Alfred Harwood

When we have been discussing the Agent-like Structure problem, lookup tables often come up as a useful counter-example or intuition pump for how a system could exhibit agent-like behaviour without agent-like structure. It is fairly intuitive that, in the limit of a large number of entries, a lookup table requires a longer program to implement than a program which 'just' computes a function.

A Generalization of the Good Regulator Theorem
Alfred Harwood

We prove a version of the Good Regulator Theorem for a regulator with imperfect knowledge of its environment aiming to minimize the entropy of an output.

This is a post explaining the proof of the paper "Robust Agents Learn Causal World Models" in detail. Check the previous post in the sequence for a higher-level summary and discussion of the paper, including an explanation of the basic setup (like terminologies and assumptions) which this post will assume from now on.

The selection theorems agenda aims to prove statements of the following form: "agents selected under criteria X has property Y," where Y are things such as world models, general purpose search, modularity, etc. We're going to focus on world models. But what is the intuition that makes us expect to be able to prove such things in the first place? Why expect world models?

The Good Regulator Theorem, as published by Conant and Ashby in their 1970 paper claims to show that 'every good regulator of a system must be a model of that system', though it is a subject of debate as to whether this is actually what the paper shows. It is a fairly simple mathematical result which is worth knowing about for people who care about agent foundations and selection theorems.

Abstractions are not Natural
Alfred Harwood

I think that the statement of the Natural Abstractions Hypothesis is not true and that whenever cognitive systems converge on using the same abstractions this is almost entirely due to similarities present in the systems themselves, rather than any fact about the world being 'naturally abstractable'. I tried to explain my view in a conversation and didn't do a very good job, so this is a second attempt.

A choice of variable in causal modeling is good if its causal effect is consistent across all the different ways of implementing it in terms of the low-level model. This notion can be made precise into a relation among causal models, giving us conditions as to when we can ground the causal meaning of high-level variables in terms of their low-level representations. A distillation of (Rubenstein, 2017).

This is an edited transcription of the final presentation I gave for the AI safety camp cohort of early 2024. It describes some of what the project is aiming for, and some motivation. Here's a link to the slides. See this post for a more detailed and technical overview of the problem. This is the presentation for the project that is described as "does sufficient optimization imply agent...

A simple model of math skill
Alex Altair

I've noticed that when trying to understand a math paper, there are a few different ways my skill level can be the blocker. Some of these ways line up with some typical levels of organization in math papers: Definitions: a formalization of the kind of objects we're even talking about.Theorems: propositions on what properties are true of these objects.Proofs: demonstrations that the theorems are...

In Clarifying the Agent-Like Structure Problem (2022), John Wentworth describes a hypothetical instance of what he calls a selection theorem. In Scott Garrabrant's words, the question is, does agent-like behavior imply agent-like architecture? That is, if we take some class of behaving things and apply a filter for agent-like behavior, do we end up selecting things with agent-like architecture...

Dealing with infinite entropy
Alex Altair

A major concession of the introduction post was limiting our treatment to finite sets of states. These are easier to reason about, and the math is usually cleaner, so it's a good place to start when trying to define and understand a concept. But infinities can be important. It seems quite plausible that our universe has infinitely many states, and it is frequently most convenient for even the...

A major concession of the introduction post was limiting our treatment to finite sets of states. These are easier to reason about, and the math is usually cleaner, so it's a good place to start when trying to define and understand a concept. But infinities can be important. It seems quite plausible that our universe has infinitely many states, and it is frequently most convenient for even the...

We are excited to release a short course on AGI safety for students, researchers and professionals interested in this topic. The course offers a concise and accessible introduction to AI alignment, consisting of short recorded talks and exercises (75 minutes total) with an accompanying slide deck and exercise workbook. It covers alignment problems we can expect as AI capabilities advance, and our...

Introduction to abstract entropy
Alex Altair

Explanations of entropy tend to only be concerned about the application of the concept in their particular sub-domain. Here, I try to take on the task of synthesizing the abstract concept of entropy, to show what's so deep about it. Entropy is so fundamental because it applies far beyond our own specific universe. It applies in any system with different states.