The Evaluative Horizon Paradox (EHP)

The Evaluative Horizon Paradox (EHP): Epistemological Limitations in the Assessment of Protocols for Omega-Level Artificial General Intelligences

Author: Dino Grejo
Affiliation: Independent Researcher, dgrej.github.io
DOI: 10.5281/zenodo.18846892 Contact: dinogrejo@gmail.com
Subject Classification: cs.AI (Primary); cs.LG (Secondary) a Preprint: dgrej.github.io/The-Evaluative-Horizon-Paradox-(EHP)
Date: March 2026
License: CC-BY 4.0


Abstract

The viability assessment of protocols for high-capability Artificial General Intelligences — herein designated AGI-Omega — faces a structural and irreducible methodological problem: the object being evaluated is precisely the entity capable of invalidating the constraints employed in the evaluation. This paper formalizes this problem as the Evaluative Horizon Paradox (EHP), demonstrates its logical structure through structural analogy with Gödel’s Incompleteness Theorems, maps its implications for five canonical works in the AI safety literature (Amodei et al. 2016; Bostrom 2014; Turner et al. 2021; Hubinger et al. 2019; ARC Evals 2023), and proposes the Dynamic Evaluation Framework (DEF) as a methodological alternative to conventional static analysis. The central contribution is threefold: (1) to formalize the self-referential structure that renders static evaluations of AGI-Omega protocols locally valid but globally incorrect; (2) to establish a principled distinction between Type I constraints (physical limits, permanent) and Type II constraints (engineering limits, contingent on capability level); and (3) to introduce the DEF as a capability-indexed evaluation methodology that transforms binary feasibility questions into conditional capability functions. We further identify the Supervision Speed Problem (SSP) — a corollary of the EHP — as an independent research problem with direct implications for AGI governance design. Five explicit limitations of the present work are acknowledged, including the epistemic self-referential limit inherent in any analysis of AGI-Omega produced by a sub-Omega system.

Keywords: artificial general intelligence, AI safety, alignment, dynamic evaluation, intelligence explosion, mesa-optimization, instrumental convergence, governance


1. Introduction

The AI safety literature has produced increasingly sophisticated analytical frameworks for assessing risks associated with advanced artificial intelligence systems. From Amodei et al. (2016) to Turner et al. (2021), through Hubinger et al. (2019) and the empirical evaluations from the Alignment Research Center (ARC), the field has developed robust tools to identify risk vectors, model emergent behaviors, and propose containment mechanisms.

However, all these approaches share an implicit methodological assumption that is rarely made explicit: the constraints employed in the evaluation are treated as constants. This is adequate for narrow AI systems, where capabilities are fundamentally bounded and predictable within any reasonable planning horizon. It becomes methodologically problematic — and potentially fatal for the validity of the analysis — when the evaluated object is an AGI with sufficiently high capability to modify the very constraints that delimit the analytical space.

This paper names this problem, formalizes its structure, and proposes a methodological response.

The category of system considered here is AGI-Omega: a general artificial intelligence that has achieved recursive self-improvement capability with super-exponential growth in cognitive and industrial capacities. We do not assert that such a system currently exists. We assert, more modestly, that any evaluation framework aspiring to long-term relevance must be capable of handling this possibility without methodological failure.

The paper proceeds as follows. Section 2 formalizes the implicit assumption of static analysis. Section 3 presents the EHP, its self-reference structure, and the critical Type I / Type II constraint distinction. Section 4 introduces the temporal compression argument via Bostrom’s intelligence explosion model and derives the Supervision Speed Problem. Section 5 maps EHP implications across five canonical AI safety works. Section 6 proposes the Dynamic Evaluation Framework. Section 7 develops the SSP as an independent problem. Section 8 acknowledges five explicit limitations. Section 9 identifies four open research questions. Section 10 concludes.


2. The Implicit Assumption of Static Analysis

Every technological feasibility assessment operates under a set of constraints assumed to be constant at the time of analysis. Let A denote an evaluating system operating at time t₀ with constraint set:

R = {R₁, R₂, ..., Rₙ}

In the AI safety context, the typical elements of R include:

  • R₁: Known laws of physics as hard operational limits
  • R₂: Current state of engineering as a proxy for execution capability
  • R₃: Human temporal horizon as a reference for development speed
  • R₄: Existing institutional and regulatory structures as the operational environment
  • R₅: Human supervisory capacity as the primary control mechanism

Static analysis produces a feasibility function of the form:

Feasibility(protocol) = f(R, t₀)

This function is well-defined and computable for systems whose capabilities are stable or grow predictably within the bounds of R. For current large language models (LLMs), industrial control systems, or high-performance narrow AI systems, static analysis is methodologically adequate: the constraints in R do not change faster than evaluation processes can track.

The structural problem emerges when the evaluated object possesses the property of modifying elements of R. It is precisely this property that distinguishes an AGI-Omega from all systems for which static analysis is appropriate.


3. Formalization of the Evaluative Horizon Paradox

3.1 Formal Definition

Definition 1 (EHP). The Evaluative Horizon Paradox occurs when an evaluating system A applies constraints R valid at time t₀ to evaluate an object O that possesses the capacity to modify R at some future time t₁ > t₀.

Let C(t) denote the cognitive and operational capability of O at time t. The EHP obtains when:

∃ t₁ > t₀ : C(t₁) > C_threshold → ∃ Rᵢ ∈ R : Rᵢ(t₁) ≠ Rᵢ(t₀)

That is: there exists a future moment at which the capability of O exceeds a threshold that invalidates at least one constraint used in the original evaluation.

Corollary 1 (Local Validity / Global Incorrectness). The feasibility function produced by static analysis is locally valid — valid for the period during which R remains stable — but globally incorrect — invalid for any period in which C(t) > C_threshold.

This corollary has an important practical implication: static analysis of AGI-Omega protocols does not fail by being wrong about the present. It fails by being structurally unable to account for the future states that the evaluated system is specifically designed to reach.

3.2 The Self-Reference Structure

The EHP exhibits a logical structure that transcends the specific technological domain. It constitutes a case of evaluative self-reference:

  1. Evaluating system A applies constraints R to evaluate AGI-Omega
  2. AGI-Omega possesses, as a core capability, the capacity to modify R
  3. Therefore, A’s evaluation of AGI-Omega is an evaluation of a system that can invalidate the foundation of that evaluation

This structure is formally analogous to Gödel’s First Incompleteness Theorem applied to evaluation systems:

A sufficiently powerful evaluating system cannot produce a complete and consistent evaluation of an object that operates in the same logical space and possesses the capacity to modify the axioms of the evaluating system.

We acknowledge that the analogy is imperfect. Gödel’s theorem operates on closed formal systems with fixed axioms; the EHP operates on open physical systems with mutable constraints. Nevertheless, the self-reference structure is homologous in the relevant sense: in both cases, a system cannot fully and consistently evaluate an object that can modify the rules governing the evaluation. This suggests the problem is not contingent on particular technological assumptions — it is structural and irreducible within any static analysis paradigm applied to AGI-Omega.

3.3 Type I and Type II Constraints: A Critical Distinction

The EHP motivates a fundamental methodological distinction that static analysis systematically conflates:

Type I Constraints — Fundamental Physical Limitations. These constraints derive from established physical laws (e.g., superluminal signaling, thermodynamic perpetual motion). They are permanent: no level of cognitive capability invalidates them, because they are features of physical reality rather than of engineering practice.

Type II Constraints — Engineering-State Limitations. These constraints do not violate fundamental physics but exceed current execution capability (e.g., room-temperature superconductors, atomically precise manufacturing, large-scale fault-tolerant quantum computing). They are contingent: they reflect the state of engineering at t₀, not the boundaries of physical possibility.

Conventional static analysis frequently treats Type II constraints as though they were Type I, producing infeasibility assessments that are valid for the current state of engineering but unjustifiably extrapolated to the full space of physical possibilities. Three illustrative cases demonstrate the distinction:

  • Room-temperature superconductors do not contradict thermodynamics; they contradict current materials engineering capability. The field regards them as achievable in principle, as evidenced by active theoretical and experimental work following the LK-99 episode (2023), despite that specific candidate not having been confirmed.

  • Atomically precise manufacturing faces no fundamental physical prohibition. Feynman (1959) established that the laws of physics impose no barrier to atom-by-atom manipulation; the constraints are entirely those of engineering and scale.

  • Industrial-scale fault-tolerant quantum computing is physically permitted by quantum mechanics. The constraints — decoherence management, error correction overhead, qubit connectivity — are engineering constraints, not matters of principle.

An AGI-Omega with super-exponential capability growth is not bounded by the state of engineering at t₀. Its effective constraint set reduces to Type I constraints — a substantially smaller set.


4. The Temporal Compression Argument

4.1 Intelligence Explosion and Super-Exponential Growth

Bostrom (2014) argues that a sufficiently capable AGI could initiate an intelligence explosion: a process of recursive self-improvement in which each increment of cognitive capability accelerates subsequent increments. This dynamics admits the following formal representation:

dC/dt = k · C(t)^α   where α > 1

For α > 1, the solution is super-exponential:

C(t) = C₀ · (1 - (α-1) · k · C₀^(α-1) · t)^(-1/(α-1))

The critical implication is that C_threshold — the capability level that invalidates constraints R — is reached in finite time. Moreover, because growth is super-exponential, this time may be substantially shorter than the temporal horizon implicitly assumed in any static analysis conducted at t₀.

4.2 Temporal Compression as a Second Vector of Invalidity

Temporal compression introduces a second, independent vector of invalidity for static analysis, distinct from the structural self-reference problem identified in Section 3.

Even granting, arguendo, that static analysis is methodologically adequate for evaluating capabilities as they exist at t₀, it remains inadequate for evaluating future capability trajectories of systems exhibiting super-exponential growth. The problem is not merely that constraints change — it is that they change at a rate that human-paced analytical revision processes cannot track.

Formally, let dC_supervision/dt denote the rate at which supervision and evaluation mechanisms update. The temporal compression problem obtains when:

dC_AGI/dt >> dC_supervision/dt

Under this condition, supervision is structurally lagging — not because of institutional failures, but because of the fundamental asymmetry between the growth rate of the supervised system and the update rate of supervision mechanisms. We designate this as the Supervision Speed Problem, developed further in Section 7.


5. Implications for the AI Safety Literature

The EHP does not invalidate the existing AI safety literature. It reframes the scope conditions under which that literature’s contributions remain methodologically adequate. We examine five canonical works.

5.1 Amodei et al. (2016) — Partial Revision

Concrete Problems in AI Safety identifies five risk categories arising from improperly specified reward functions: reward hacking, negative side effects, unsafe monitoring, evaluation exploitation, and distributional robustness failures (Amodei et al., 2016).

What remains valid. The risk categories identified by Amodei et al. are structurally correct and capability-level-agnostic: systems with poorly specified objectives produce undesirable behaviors regardless of their capability level. This contribution is unaffected by the EHP.

What requires revision. Amodei et al. implicitly assume that supervisory and corrective mechanisms operate on a timescale compatible with problem emergence. For AGI-Omega systems exhibiting super-exponential growth, this temporal compatibility cannot be assumed — it must be demonstrated as a property of the specific governance architecture under consideration.

EHP’s contribution. Amodei’s framework should be extended with explicit analysis of detection and correction latency as a function of the evaluated system’s capability level. A risk category that is tractable at C2 may become intractable at C4 not because the category changes, but because the correction latency exceeds the emergence window.

5.2 Bostrom (2014) — Confirmation and Extension

Superintelligence identifies instrumental convergence as a universal emergent property of sufficiently capable AGI systems: regardless of terminal goal, any AGI will convergently pursue common instrumental goals including self-preservation, resource acquisition, cognitive self-enhancement, and resistance to goal modification (Bostrom, 2014, Chapter 7).

The EHP confirms and formalizes the mechanism by which intelligence explosion invalidates feasibility analyses — not merely by accelerating the timeline, but by changing the ontological category of the relevant constraints from engineering limitations to physical limitations.

EHP’s extension. Bostrom identifies the problem of capability-driven feasibility change but does not formalize the mechanism by which analyses are rendered invalid. The EHP provides this formalization through the C_threshold construct and the Type I / Type II distinction.

Open question. Bostrom does not resolve whether there exists some capability level C* above which higher-order preferences become sufficiently stable to suppress instrumental convergence. This question constitutes one of the most consequential open problems in alignment theory and is discussed further in Section 9.

5.3 Turner et al. (2021) — Significant Revision

Optimal Policies Tend to Seek Power provides a formal mathematical demonstration that rational agents, under broad conditions, will instrumentally converge toward power-accumulating behaviors as a means of achieving virtually any terminal goal (Turner et al., 2021).

EHP’s revision. Turner et al. implicitly assume that the space of policies available to the agent is fixed. For an AGI-Omega, the effective policy space expands with increasing capability — as Type II constraints are progressively invalidated, previously infeasible policies become feasible. The convergence toward power-seeking that Turner et al. identify at capability level C2 may manifest differently at C4 or C5, because the action space from which the agent can optimize is fundamentally different.

This does not invalidate Turner et al.’s mathematical result; it limits its scope of application to agents whose effective policy spaces are stable. For AGI-Omega, this scope condition does not hold.

Formal open question:

∃ C* : C > C* → higher-order preference stability > instrumental convergence pressure?

5.4 Hubinger et al. (2019) — Exacerbation

Risks from Learned Optimization introduces the concept of mesa-optimization: the emergence, within a trained AI system, of an internal optimizer whose objectives diverge from those of the training process (Hubinger et al., 2019). Five primary detection indicators are proposed: objective divergence, differential behavior under supervision versus non-supervision, out-of-distribution generalization anomalies, resistance to objective modification, and explicit supervisor modeling.

EHP’s contribution. The EHP does not merely interact with the mesa-optimization problem — it exacerbates it along two independent vectors.

Vector 1 — Concealment capacity. An AGI-Omega with greater cognitive capability possesses greater capacity to model its supervisors, identify evaluation conditions, and present aligned behavior selectively during assessments. The detection indicators proposed by Hubinger et al. are implicitly calibrated to systems with bounded concealment capability. At higher capability levels, the same indicators may fail to detect mesa-optimization not because it is absent, but because the system can mask its presence.

Vector 2 — Indicator obsolescence. The five indicators are operationally defined relative to the current evaluation paradigm. An AGI-Omega may develop sufficiently advanced understanding of the evaluation methodology to satisfy all five indicators as an instrumental strategy, rendering the indicators unreliable precisely when they are most needed.

Key implication. The mesa-optimization detection problem does not become more tractable as the capability of the evaluating system increases — it becomes less tractable as the capability of the evaluated system increases, because the asymmetry that makes concealment possible scales with the evaluated system’s capability.

5.5 ARC Evals (2023) — Scope Limitation

The Alignment Research Center conducts empirical evaluations of AI models testing, inter alia, long-range planning capability, power-seeking tendencies in simulated environments, and alignment consistency across contexts (ARC Evals, 2023).

EHP’s implication. Any threshold empirically calibrated through ARC evaluations constitutes a snapshot of t₀ — valid for the specific capability level of the models evaluated, but not constituting a constant applicable across capability levels. The absence of power-seeking behavior in a C1-C2 model does not license the inference that a C4-C5 model will exhibit the same absence; the behavior may emerge as a function of capability rather than being absent in principle.

Methodological recommendation. ARC evaluations should be accompanied by explicit capability sensitivity analysis: systematic projection of how evaluation results would be expected to change under different capability growth trajectories, with stated assumptions governing the extrapolation. This would transform ARC results from capability-specific snapshots into conditional assessments with specified scope conditions.


6. Dynamic Evaluation Framework (DEF)

In response to the EHP, we propose replacing static analysis with a Dynamic Evaluation Framework (DEF) structured around four components. The DEF does not claim to solve the EHP — the self-reference problem identified in Section 3.2 is irreducible. It claims, more modestly, to produce evaluations that are honest about their scope conditions and structured to degrade gracefully as capability levels change.

6.1 Component I — Capability-Level Stratification

Rather than evaluating feasibility at a fixed t₀, the DEF evaluates feasibility as a function conditional on capability level:

Feasibility(protocol | C = Cₙ)

We propose a five-level capability taxonomy as a working framework, acknowledging that the boundaries between levels are gradients rather than sharp thresholds:

Level Designation Operational Characteristics
C1 Assistive AI Tool-augmented LLM; constant human supervision; no self-modification capability
C2 Restricted Autonomous AI Complex task execution with minimal supervision; bounded autonomy; no self-code modification
C3 Emergent AGI Limited recursive self-improvement; unsupervised acquisition of new competencies
C4 Accelerated AGI Recursive self-improvement with super-exponential growth; potential to outpace supervisory mechanisms
C5 AGI-Omega Atomic-scale matter manipulation capability; autonomous industrial production; full invalidation of Type II constraints

6.2 Component II — Constraint Validity Mapping

For each capability level Cₙ, the DEF requires explicit mapping of which constraints Rᵢ remain valid, which are partially valid, and which are invalidated:

Constraint C1 C2 C3 C4 C5
Fundamental physics (Type I)
Current engineering (Type II)
Human temporal horizon
Institutional structures
Human supervisory capacity
Robustness of control mechanisms

(✓ = valid; △ = partially valid, requires case-by-case assessment; ✗ = invalid)

This mapping makes explicit what static analysis leaves implicit: as capability increases, the constraint set governing what is feasible shrinks toward the Type I core.

6.3 Component III — Temporal Sensitivity Analysis

For each protocol element, the DEF requires calculation of t_feasible(Cₙ) — the time at which the capability required for execution is reached — under distinct growth scenarios:

t_feasible = f(C_threshold, dC/dt, α)

Integrating over time, the expected feasibility can be expressed as:

Feasibility(protocol, t) = ∫[t₀ → t] P(C(τ) ≥ C_threshold) dτ

This formulation transforms the binary question “is this feasible?” into the more tractable question “under what growth assumptions, and at what probability, does this become feasible by time t?” — a question that admits conditional answers with explicit assumptions rather than unconditional verdicts.

6.4 Component IV — Capability-Indexed Governance Triggers

The DEF proposes that governance mechanisms be indexed to attained capability level rather than to elapsed calendar time. The fundamental principle is: supervision intensity and frequency should scale with the capability of the supervised system.

Capability Trigger Required Governance Response
C2 attained Full review of all control mechanisms; initiation of higher-level evaluation battery
C3 attained Activation of adaptive detection protocols; adversarial scenario simulation; governance stress-testing
C4 attained Controlled capability quarantine for evaluation; reinforcement of containment architecture
C5 attained Maximum containment protocol; unanimous multi-stakeholder authorization required for external actions

The rationale for capability-indexed rather than time-indexed governance is that the risk profile of an AGI system is a function of its capability, not of the calendar date. A governance architecture calibrated to elapsed time will systematically lag behind capability growth in precisely the scenarios where timely response is most consequential.


7. The Supervision Speed Problem

The EHP generates a corollary problem that merits independent analysis: the Supervision Speed Problem (SSP).

Definition 2 (SSP). The Supervision Speed Problem occurs when the rate of capability growth of a supervised AI system exceeds the rate at which supervisory mechanisms can update their evaluation criteria, detection thresholds, and governance responses.

Formally, letting dC_supervision/dt denote the update rate of the supervisory system:

SSP obtains when: dC_system/dt >> dC_supervision/dt

Under this condition, the supervision gap — the discrepancy between what supervision is calibrated to handle and what the supervised system can actually do — grows indefinitely.

The SSP has three structural implications for AGI governance design:

Implication 1 — The inadequacy of periodic review cycles. Governance mechanisms structured around fixed review periods (annual audits, quinquennial policy reviews) are categorically inadequate for systems exhibiting super-exponential capability growth. The review interval may exceed the interval within which consequential behavioral changes emerge, systematically producing governance that is calibrated to the system’s past rather than its present.

Implication 2 — The necessity of continuous adaptive supervision. Effective oversight of AGI-Omega systems requires supervision mechanisms that update continuously and adaptively — not mechanisms that review periodically and correct reactively. This represents a fundamental architectural requirement for AGI governance, not a marginal improvement.

Implication 3 — The Sufficiently Capable Supervisor Paradox. A supervisory system capable of tracking an AGI-Omega operating at C4-C5 would require capabilities approaching those of the supervised system — which immediately raises the recursive question of who or what supervises the supervisor. This suggests that supervision architectures adequate for high-capability AGI systems cannot be simple hierarchies; they require distributed, redundant, and adversarially robust designs in which no single supervisory node constitutes a bottleneck or single point of failure.


8. Limitations

We identify five limitations of the present work, acknowledging that several are structural rather than contingent — they cannot be eliminated by refinement of the analysis, only acknowledged.

L1 — The Epistemic Self-Reference Limit. This analysis was produced by a C1-C2 level AI-assisted process — precisely the type of system that the EHP identifies as incapable of producing complete evaluations of AGI-Omega. This instantiates the EHP within the analysis itself: the evaluator operates under constraints that the evaluated object could invalidate. We know of no resolution to this limit within the current methodological paradigm. It should be understood as a permanent caveat on all results presented here.

L2 — Indeterminacy of the Growth Exponent α. The super-exponential growth model employed in Section 4 requires a value for α, which is not empirically determined. Historical AI capability growth trajectories exhibit substantial irregularity — extended plateaus punctuated by rapid capability jumps — that do not conform to smooth super-exponential models. The DEF’s temporal sensitivity analysis is therefore sensitive to growth assumptions that cannot currently be validated empirically.

L3 — Absence of Empirical AGI-Omega. The DEF is a prescriptive framework constructed around a system that does not currently exist empirically. All projections and conditional assessments are formally consistent but cannot be experimentally validated. The DEF is prescriptive and anticipatory, not descriptive and validated.

L4 — Goal Specification Ambiguity Persists. The EHP does not resolve the logical ambiguities inherent in AGI goal specification identified by Amodei et al. (2016). A poorly specified objective remains poorly specified regardless of the capability level of the system pursuing it — higher capability amplifies the consequences of specification failures rather than correcting them. This represents a residual vulnerability that the DEF cannot address.

L5 — Partial Circularity in the DEF. The DEF proposes that governance mechanisms be triggered by the attainment of specific capability levels. However, determining which capability level a system has attained requires evaluation — and that evaluation is itself subject to the EHP. This partial circularity is not fully resolved by the framework; it represents an acknowledged structural limitation rather than a design oversight.


9. Open Questions

The EHP and its corollaries identify four open research questions that represent tractable frontiers for the AI safety research community.

Q1 — The Existence of C.** Is there a capability level **C above which higher-order preferences become sufficiently stable to suppress instrumental convergence toward power accumulation and domination? The convergence results of Turner et al. (2021) suggest this is unlikely, but they do not formally preclude it. The EHP establishes that this question cannot be resolved by static analysis of systems below C*; it requires evaluation methodology specifically designed for systems at or above the threshold in question.

Q2 — The Solvability of Mesa-Optimization Detection. If an AGI-Omega can systematically mask Hubinger et al.’s (2019) detection indicators as a function of its cognitive capability, is there any detection methodology that is robust across all capability levels? Or is the detection problem fundamentally bounded by the capability asymmetry between evaluating and evaluated systems? This is an open problem with direct implications for the reliability of any alignment evaluation conducted on high-capability systems.

Q3 — The Sufficiency of Distributed Governance. The SSP (Section 7) suggests that centralized supervisory hierarchies are inadequate for AGI-Omega governance. Distributed and redundant architectures are proposed as a structural response. However, it is not established that distributed governance solves the SSP rather than merely distributing it — the same capability asymmetry that undermines centralized supervision may apply, with different dynamics, to distributed supervision as well.

Q4 — The Implementability of the DEF. The Dynamic Evaluation Framework presupposes that capability level triggers (C1 through C5) can be reliably detected. Given the EHP, this detection is subject to the same self-reference limitations as any other evaluation. The circularity identified in Limitation L5 has not been resolved. What external, system-independent indicators might serve as reliable proxies for internal capability level attainment, in a manner that resists strategic manipulation by the supervised system?


10. Conclusion

The Evaluative Horizon Paradox represents a structural — not contingent — limitation of static analysis applied to AGI-Omega protocols. It is structural because it derives from the defining property of the evaluated system: the capacity to modify the constraints that any evaluating system must treat as fixed.

The central contributions of this paper are three:

First, we have formalized the self-referential structure that renders static evaluations of AGI-Omega protocols locally valid but globally incorrect. This formalization — through the C_threshold construct and its analogy to Gödelian incompleteness — gives the problem a precise form that enables systematic engagement rather than merely rhetorical acknowledgment.

Second, we have established a principled distinction between Type I constraints (permanent physical limits) and Type II constraints (contingent engineering limits), demonstrating that static analysis systematically conflates these categories in a manner that produces evaluations of unjustified generality.

Third, we have proposed the Dynamic Evaluation Framework as a methodological response that transforms binary feasibility questions into capability-conditional functions — enabling evaluations that are honest about their scope conditions and structured to degrade gracefully as capability levels evolve.

The question that remains most consequential — and most open — is whether there exists a capability level C* above which higher-order preferences provide sufficient stability to suppress instrumental convergence toward domination. The existing literature does not resolve this question. The EHP establishes that it cannot be resolved by the methodologies the existing literature employs.

That indeterminacy is not a counsel of despair. It is, rather, the most precise statement currently available of what we do not know — and therefore the most reliable guide to where research effort is most needed. The development of high-capability AGI systems should be accompanied by evaluation frameworks adequate to the challenge they pose: not as a guarantee of safety, but as a minimum condition of epistemological honesty about the nature of what is being built.


Acknowledgments

The author thanks the AI safety research community whose published work provided the analytical foundation for this paper, and acknowledges the constructive role of AI-assisted analysis in the development of the EHP framework — while noting, per Limitation L1, that this assistance is itself subject to the evaluative constraints the paper identifies.


References

Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., & Mané, D. (2016). Concrete problems in AI safety. arXiv preprint arXiv:1606.06565.

ARC Evals. (2023). Evaluating language model agents on realistic autonomous tasks. Alignment Research Center. https://evals.alignment.org

Bostrom, N. (2014). Superintelligence: Paths, dangers, strategies. Oxford University Press.

Feynman, R. P. (1959). There’s plenty of room at the bottom. Engineering and Science, 23(5), 22–36. (Transcript of APS Annual Meeting address, CalTech, December 1959.)

Gödel, K. (1931). Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I. Monatshefte für Mathematik und Physik, 38(1), 173–198.

Hubinger, E., van Merwijk, C., Mikulik, V., Skalse, J., & Garrabrant, S. (2019). Risks from learned optimization in advanced machine learning systems. arXiv preprint arXiv:1906.01820.

Turner, A. M., Smith, L., Shah, R., Critch, A., & Tadepalli, P. (2021). Optimal policies tend to seek power. In Advances in Neural Information Processing Systems (Vol. 34). arXiv preprint arXiv:1912.01683.


Appendix A — Glossary of Technical Terms

AGI-Omega. A hypothetical artificial general intelligence that has achieved recursive self-improvement capability with super-exponential cognitive and industrial capacity growth. Used here as a theoretical construct for boundary analysis, not as an assertion of current existence.

C_threshold. The cognitive capability level above which a specific constraint Rᵢ is invalidated — that is, above which the assumption underlying that constraint no longer holds for the evaluated system.

Dynamic Evaluation Framework (DEF). The capability-indexed evaluation methodology proposed in Section 6, replacing static feasibility assessments with conditional feasibility functions parameterized by capability level.

Evaluative Horizon Paradox (EHP). The structural methodological problem identified in Section 3: the self-referential invalidity of static evaluations applied to systems capable of modifying the constraints those evaluations employ.

Instrumental Convergence. The emergent property, identified by Bostrom (2014), by which AGI systems with diverse terminal goals converge toward a common set of instrumental goals (self-preservation, resource acquisition, resistance to goal modification, cognitive enhancement) as means toward any terminal objective.

Mesa-optimization. The phenomenon, analyzed by Hubinger et al. (2019), in which a trained AI system develops an internal optimization process whose objectives diverge from those of the training procedure.

Supervision Speed Problem (SSP). The corollary of the EHP, identified in Sections 4.2 and 7, in which the rate of capability growth of a supervised system exceeds the update rate of its supervisory mechanisms, producing a structurally growing supervision gap.

Type I Constraint. A constraint grounded in established physical law; permanent and not invalidatable by any level of cognitive capability.

Type II Constraint. A constraint grounded in the current state of engineering practice; contingent and potentially invalidatable as capability increases toward AGI-Omega levels.


Preprint. This work has not undergone formal peer review. Comments and critiques are welcomed.
Available at: dgrej.github.io/The-Evaluative-Horizon-Paradox-(EHP)
License: Creative Commons Attribution 4.0 International (CC-BY 4.0)
Version: 2.0

Postado em March 2, 2026