Domain-limited optimization

There are multiple different ways to measure the strength of an optimizing system. One of these may be called the “domain” of optimization. An optimizing system may optimize only along some lower-dimensional subset of the state space. This can hold true even if the optimization is, by the other measures, arbitrarily strong.

At Dovetail we have the hypothesis that very powerful optimizers can be safely built by building them as domain-limited optimizers. We speculate that systems theory can be used to properly define and understand this type of optimization.

Since domain-limited optimization is very powerful within its domain, it can be very useful without being nearly as dangerous as unbounded optimization. If we knew how to construct optimizers limited in this way, then society could reap much of the benefits of AI while not incurring existential risks.

Analogy with energy

Physical energy is a measure of how much a system’s state could potentially change, if the energy is released in a particular way. Huge amount of energy are therefore in some sense quite dangerous! But humanity has figured out how to reliably harness, store, and distribute awe-inspiring quantities of energy. We know how to build systems in which the energy has a constrained domain. For example, electricity stays within wires. Even though the energy going through the wires could, in some counterfactual sense, cause damage if it went far outside the wires, we understand the system’s physics well enough to know that it’s not the kind of system where that could happen.

We believe there may be an analogy with optimization. We believe that optimizing systems could be understood well enough that, even if they apply optimization power at levels that would be dangerous if they were applied in arbitrary domains, we can know they are not the kind of system where that counterfactual could happen.

Metastability

In some systems, there may be optimization processes that are temporarily domain-limited via metastability. Metastability is not very well understood, but its timescales are often super-exponential.