Our Writings

Distilling the Internal Model Principle part II

Jose Faustino

30 April 2025

This post was written during the agent foundations fellowship with Alex Altair funded by the LTFF. Introduction This is the second part of a two posts series explaining the Internal Model Principle and how it might relate to AI Safety, particularly to Agent Foundations research. In the first post, we constructed a simplified version of IMP that was easier to understand and focused on building...

The Internal Model Principle: A Straightforward Explanation

Alfred Harwood

12 April 2025

The Internal Model Principle (IMP) is often stated as "a feedback regulator must incorporate a dynamic model of its environment in its internal structure" which is one of those sentences where every word needs a footnote. I have written this post to summarise what I understand of the Internal Model Principle and I have tried to emphasise intuitive explanations.

Distilling the Internal Model Principle

Jose Faustino

4 February 2025

This is the first part of a two-post series about the Internal Model Principle, which could be considered a selection theorem, and how it might relate to AI Safety, particularly to Agent Foundations research. In this first post, we will construct a simplified version of IMP that is easier to explain compared to the more general version and focus on the key ideas, building intuitions about the theorem's assumptions.

Dialogue: Towards building blocks of ontologies

Daniel C et al.

8 February 2025

I am working on a project about ontology identification. I've found conversations to be a good way to discover inferential gaps when explaining ideas, so I'm experimenting with using dialogues as the main way of publishing progress during the fellowship. We can frame ontology identification as a robust bottleneck for a wide variety of problems in agent foundations & AI alignment...

Ruling Out Lookup Tables

Alfred Harwood

4 February 2025

When we have been discussing the Agent-like Structure problem, lookup tables often come up as a useful counter-example or intuition pump for how a system could exhibit agent-like behaviour without agent-like structure. It is fairly intuitive that, in the limit of a large number of entries, a lookup table requires a longer program to implement than a program which 'just' computes a function.

A Generalization of the Good Regulator Theorem

Alfred Harwood

4 January 2025

We prove a version of the Good Regulator Theorem for a regulator with imperfect knowledge of its environment aiming to minimize the entropy of an output.

Proof Explained for "Robust Agents Learn Causal World Model"

Dalcy

14 December 2024

This is a post explaining the proof of the paper "Robust Agents Learn Causal World Models" in detail. Check the previous post in the sequence for a higher-level summary and discussion of the paper, including an explanation of the basic setup (like terminologies and assumptions) which this post will assume from now on.

An Illustrated Summary of "Robust Agents Learn Causal World Model"

Dalcy

14 December 2024

The selection theorems agenda aims to prove statements of the following form: "agents selected under criteria X has property Y," where Y are things such as world models, general purpose search, modularity, etc. We're going to focus on world models. But what is the intuition that makes us expect to be able to prove such things in the first place? Why expect world models?

A Straightforward Explanation of the Good Regulator Theorem

Alfred Harwood

18 November 2024

The Good Regulator Theorem, as published by Conant and Ashby in their 1970 paper claims to show that 'every good regulator of a system must be a model of that system', though it is a subject of debate as to whether this is actually what the paper shows. It is a fairly simple mathematical result which is worth knowing about for people who care about agent foundations and selection theorems.

Abstractions are not Natural

Alfred Harwood

18 November 2024

I think that the statement of the Natural Abstractions Hypothesis is not true and that whenever cognitive systems converge on using the same abstractions this is almost entirely due to similarities present in the systems themselves, rather than any fact about the world being 'naturally abstractable'. I tried to explain my view in a conversation and didn't do a very good job, so this is a second attempt.

But Where do the Variables of my Causal Model come from?

Dalcy

9 August 2024

A choice of variable in causal modeling is good if its causal effect is consistent across all the different ways of implementing it in terms of the low-level model. This notion can be made precise into a relation among causal models, giving us conditions as to when we can ground the causal meaning of high-level variables in terms of their low-level representations. A distillation of (Rubenstein, 2017).

[Talk transcript] What “structure” is and why it matters

Alex Altair

25 July 2024

This is an edited transcription of the final presentation I gave for the AI safety camp cohort of early 2024. It describes some of what the project is aiming for, and some motivation. Here's a link to the slides. See this post for a more detailed and technical overview of the problem. This is the presentation for the project that is described as "does sufficient optimization imply agent...

A simple model of math skill

Alex Altair

21 July 2024

I've noticed that when trying to understand a math paper, there are a few different ways my skill level can be the blocker. Some of these ways line up with some typical levels of organization in math papers: Definitions: a formalization of the kind of objects we're even talking about.Theorems: propositions on what properties are true of these objects.Proofs: demonstrations that the theorems are...

Towards a formalization of the agent structure problem

Alex Altair

29 April 2024

In Clarifying the Agent-Like Structure Problem (2022), John Wentworth describes a hypothetical instance of what he calls a selection theorem. In Scott Garrabrant's words, the question is, does agent-like behavior imply agent-like architecture? That is, if we take some class of behaving things and apply a filter for agent-like behavior, do we end up selecting things with agent-like architecture...

Dealing with infinite entropy

Alex Altair

1 March 2023

A major concession of the introduction post was limiting our treatment to finite sets of states. These are easier to reason about, and the math is usually cleaner, so it's a good place to start when trying to define and understand a concept. But infinities can be important. It seems quite plausible that our universe has infinitely many states, and it is frequently most convenient for even the...

Consider using reversible automata for alignment research

Alex Altair

11 December 2022

In recent years, there have been several cases of alignment researchers using Conway's Game of Life as a research environment; Introducing SafeLife: Safety Benchmarks for Reinforcement Learning (Wainwright, Eckersley 2019)Agency in Conway’s Game of Life (Flint 2021)Finite Factored Sets (Garrabrant 2021)Optimization Concepts in the Game of Life (Krakovna, Kumar 2021)Finding gliders in the game of...

"Normal" is the equilibrium state of past optimization processes

Alex Altair

30 October 2022

Orienting around the ideas and conclusions involved with AI x-risk can be very difficult. The future possibilities can feel extreme and far-mode, even when we whole-heartedly affirm their plausibility. It helps me to remember that everything around me that feels normal and stable is itself the result of an optimization process that was, at the time, an outrageous black swan. Modernity If you...

Introduction to abstract entropy

Alex Altair

20 October 2022

Explanations of entropy tend to only be concerned about the application of the concept in their particular sub-domain. Here, I try to take on the task of synthesizing the abstract concept of entropy, to show what's so deep about it. Entropy is so fundamental because it applies far beyond our own specific universe. It applies in any system with different states.