For a more rigorous treatment of this research problem, see Towards a formalization of the agent structure problem.
The agent structure problem is a pre-formal conjecture that if a system has agent-like behavior, it probably has agent-like structure. For more on this distinction, see Behavior versus structure.
The agent structure problem is an idea that has much precedent in the history of research before agent foundations.
Within the field of agent foundations, Scott Garrabrant wrote a short post about the idea in 2019. It gained popularity when John Wentworth described his version of it under “the agent-like structure problem” and offered a bounty for progress.
We wrote about our version in Towards a formalization of the agent structure problem. That post serves as an accurate presentation of how it fits into Dovetail’s research agenda.
The exact way in which the agent structure problem is true or false will likely lead to a fair amount of deconfusion about agents and optimization. This alone means it fits naturally into the study of agent foundations.
If we accept the argument that agents are a much more dangerous type of AI system, then it would be very useful to be able to tell what systems are agentic, and to be able to measure exactly how agentic they are. If the agent structure problem resolves to being mostly true, then it tells us that we get a bound on how agentic a system is by observing only its external behavior. ML systems are usually trained to within a certain “loss” or performance metric, which is likely to be something that could be translated into the agent-like behavior of the problem statement.
In whatever way the agent structure problem does resolve true, it will be a fairly strong argument for the naturalness of the particular definition of agent used in the proof.
If the agent structure problem is essentially false, then that is also very interesting and potentially useful for AI safety. If it is false, then that means that there can be systems that have reliably high performance across a wide range of environments, and yet are not agent-structured at all. This in turn means they are less dangerous! Having some formal insight into particular kinds of safe and useful systems could help give society the confidence needed to responsibly build them. This would also help reduce “alignment tax” and take some pressure off the economic incentive to build potentially dangerous systems.