2025 Dovetail Research Fellowship

This job is part of an Advanced Research + Invention Agency-funded project.

Dovetail is an agent foundations research group. We’ve recently received an ARIA grant to fund more team members over the next year. This application is for a 10-week fellowship, which may then be extended to 1 year full-time.1 We’re especially looking for people based in the UK. This application will be open until Monday 15th September 2025.

You can read more about agent foundations and our research agenda on our website.

What the role might be like

Alfred and Alex will be leading a group of roughly 4 to 6 other people all engaged in mathematical AI safety research. We’re open to experience levels from undergraduates who are enthusiastic about math to post-docs who are looking for a way to transition into AI safety.2 (We also welcome applicants outside academia!)

Some group members might work together, while others might do solo projects. All group members will have regular one-on-one check-ins with us, and we’ll also hold regular group meetings. We’ll spend our time finding useful formal definitions of relevant concepts, formulating and proving theorems about them, and communicating our ideas and results. We’ll find and share relevant papers, host read-through meetings, and discuss ideas with external researchers.

Here are some of the basic parameters of the fellowship.

  • £36,000 full-time equivalent, pro-rated
  • Full- to half-time. We can be flexible with hours if you need to fit this job around other work or study.
  • 10 weeks, with some fellowships extended to 1 year. It seems to take people roughly about 6 weeks to get up to speed on enough agent foundations concepts to get a sense of how they relate to the research problems. We’re thinking of this fellowship as an extended work-trial for potentially working together longer-term. Whether we decide to invite you to continue working with us after the first three months will depend on how our collaboration goes (including how well your skills & interests fit our research agenda, and how well you fit into the group culture).
  • Remote-first (but happy to meet in person). Dovetail is an international, distributed team, so we don’t have an office. But Alfred and many of our group members will be in the UK, and Alex will be in the San Francisco Bay area, so if you live in one of those two places we may be able to arrange regular in-person meetings.
  • Meeting 2-5 times per week. Especially in the beginning, we’d like to do a pretty large amount of syncing up. It can take a long time to convey all the aspects of the research problems. We also find that real-time meetings regularly generate new ideas. That said, some people find meetings worse for their productivity, and so we’ll be responsive to your particular work style.
  • An end-of-term write-up. It seems to take longer than 10 weeks to get results in the types of questions we’re interested in, but we think it’s good practice to commit to producing a write-up of some kind from the initial period. For those who stay for a year, the group will produce outputs on something like a quarterly basis.

The research problems

As with a lot of research in agent foundations, it’s quite difficult to concisely communicate what exactly we work on. Probably the best way to tell if you will be interested in our research problems is to read our research wiki or our write-ups on LessWrong, and then have a conversation with us about it.

All our research is purely mathematical,3 rather than experimental or empirical. None of it involves machine learning per se, but the resulting theorems should apply to ML systems.

The domains of math that we’ve been focusing on include: dynamical systems of all kinds, probability theory, information theory, algorithmic information theory, measure theory, ergodic theory. Things we’re interested in but less knowledgeable about include: singular learning theory, computational mechanics, abstract algebra, category theory, reinforcement learning theory.

Here are some more concrete examples of projects you could work on.

  • Take theorem 10 from this information theory paper by Touchette & Lloyd and extend it to the case with multiple timesteps, or where we measure the change in utility rather than the change in entropy.
  • Try to solve John Wentworth and David Lorell’s Deterministic Natural Latents problem (and/or the related Deterministic Maximal Redunds problem) possibly using Alfred’s comment as a starting point .
  • Write an explainer for Brudno’s theorem.4
  • Adapt the main result from General Agents need World Models by Richens et al. This paper proves that a policy which achieves all goals from a certain set must contain enough information to reconstruct the whole environment. Find a more plausible set of goals that gives the same result, possibly by assuming additional structure to the environment.
  • Try to make progress on formalizing the fragility of value.
  • Do a literature review on the differences between the utility function formalism and the reinforcement learning reward formalism, explain exactly when they are and are not compatible, and discuss which existing results do or don’t apply across both.

If there is something else you are excited to work on, we are also open to hearing other project proposals, provided that they are within the scope of our research program.

Application process

If you’re interested, fill out this application form! You’re also welcome to email us with any questions. If you are unsure whether to submit an application, we encourage you to err on the side of applying. After that, the rest of the application steps are;

  • A short, conversational interview (20 min)
  • A longer interview (1h) where we talk about your and Dovetail’s research interests in more detail, and come up with some potential concrete projects.
  • Then, you go off and do some thinking & reading about your project ideas, and write a more detailed proposal. We’ll pay you £200 for this part, and you should spend roughly 6-10 hours on it.
  • A second longer interview (1h), where we go through your proposal.

After this, we should have a pretty good sense of whether we would work well together, and we’ll make a decision about whether to offer you the 10-week fellowship (or whatever else we may have negotiated). If you are successful, we expect that the time between the first interview and the job starting will be two weeks.

  1. We’re flexible with the specifics, such as if you would prefer to align the start and end dates with an academic calendar, or if you have existing obligations on your time. 

  2. To give some more detail on these bounds; we expect applicants to have mastery of some mathematical topic beyond the standard STEM classes (multivariable/vector calc, differential equations, linear algebra, and probability theory) but it doesn’t matter what, specifically. No one is an expert in agent foundations, so we’re really seeking what is sometimes called “mathematical maturity”. On the other side, if you have far more technical expertise than either of us, we may be willing to include you in the program if it seems mutually beneficial. 

  3. More specifically, the desired results are mathematical. The ideas are almost all “pre-mathematical”, in that the first part will be to translate the ideas into the appropriate formalisms. 

  4. A. A. Brudno, Entropy and the complexity of the trajectories of a dynamical system (1983)