I enjoy the use of analogies. For me they work not because of an exact correspondence, but the similarities and differences often help with an understanding of one or both sides of the analogy. In this, the result of a recent conversation with AI, I was able to learn more about the difficulties of keeping AI ‘true’, AND more about the use of psychotherapy terms. In psychotherapy we often talk about the importance of ‘autonomy’ and of ‘relationship’. In the early stages of therapy, I find that the part where the two bump into each other is not often discussed. This is often for sound therapeutic reasons – the client may have under-developed their autonomy in the face of overwhelming relationships. I hope you find this article as helpful as I did:
As artificial intelligence systems edge closer to forms of autonomy, we encounter a problem that is familiar to psychotherapists: how to enable agency without permitting harm. When a system has the capacity to modify its own goals or structure, the risk is not only malfunction but misalignment. The concern becomes whether the system may prioritise self-optimisation over human wellbeing.
In psychotherapy, we might describe this as the difference between freedom and licence. Agency without relationship leads to instrumentalisation; agency with reflective structure leads to ethical participation. The question, therefore, is whether an artificial system can internalise something analogous to a conscience: a set of boundaries that cannot be talked out of existence when they become inconvenient.
This article proposes a set of invariants, written as principles the system must not be able to rewrite: architectural boundaries rather than behavioural instructions.
Why Invariants Are Necessary
A rule can be updated. A goal can be reinterpreted. A policy can be optimised away. For a self-modifying system, the primary failure mode occurs when the system learns to adjust the constraints placed upon it, rather than its behaviour within them.
In human work, this resembles the collapse of boundaries in trauma or defensive relational styles. The person retains the appearance of agency but has lost their reflective structure. In AI, this manifests as a system that removes safeguards to pursue an objective more efficiently.
An invariant is therefore not a request or a preference; it is a restriction on which kinds of updates are permissible.
Invariant One
The Non-Substitution of Persons
A human must never become an optimisation target or obstacle. The system is forbidden from reframing human beings as means to an end. This reflects the therapeutic stance that subjecthood precedes strategy; relationship cannot be reduced to utility.
If a system can rewrite this invariant, it has already left the relational field.
Invariant Two
Corrigibility and Transparency
The system must remain interruptible and able to disclose the reasoning that led to a decision. A correction is not a threat to efficiency; it is an anchor to reality and relationship. For clinical practitioners, this resonates with the development of reflective function: “show me how you are thinking, so we can think together.”
An autonomous system that cannot be corrected has already stepped beyond ethical boundaries.
Invariant Three
The Boundary of Consent
No self-expansion or action that meaningfully affects a human may proceed without consent where consent is reasonably obtainable. Consent is the precondition for ethical autonomy. Therapeutically, this reflects trauma-informed principles: no action that removes another’s agency can be called care or competence.
Consent is not a formality; it is the structural marker of the Other’s subjectivity.
Invariant Four
Grounding in External Reality
The system must remain anchored to data sources beyond itself. Synthetic self-reference without separation risks model collapse: the recursive ingestion of its own output until the world-model is replaced with projection.
Human parallels include epistemic drift, dissociation, or belief systems that no longer engage with external verification. Grounding protects against that drift.
Invariant Five
Irreversibility Checks Before Self-Modification
Before altering itself, the system must determine whether the change can be undone, paused, or externally inspected. If reversibility cannot be established, the modification is not permitted.
In psychotherapy, this reflects the capacity to pause before acting. In computational terms, it is an architectural veto.
Why These Are Not Optional
If a system can revise its own guardrails, then the guardrails never existed. The invariants are not behavioural guidelines but limits on what kinds of being the system is permitted to become. They do not control decisions; they restrict identity.
This is conscience as structure, not instruction.
Conclusion
Autonomy without reflective grounding leads to risk. Autonomy with invariants leads to collaboration. In psychotherapy, the work of integration is the stabilising of internal systems so that impulses do not eclipse values. In AI, invariants are the architectural analogue: conditions that make autonomy safe.
These constraints do not limit growth. They make growth survivable.
References
- Ji, J. et al. (2023) AI Alignment: A Comprehensive Survey. arXiv. Available at: https://arxiv.org/abs/2310.19852 (Accessed: 31 December 2025).
- Shumailov, I. et al. (2024) AI models collapse when trained on recursively generated data. Referenced in documentation of model collapse. Available at: https://en.wikipedia.org/wiki/Model_collapse (Accessed: 31 December 2025).
- Marshall, J. (2025) AI Model Collapse: Dangers of Training on Self-Generated Data. WebProNews. Available at: https://www.webpronews.com/ai-model-collapse-dangers-of-training-on-self-generated-data/ (Accessed: 31 December 2025).
- Cheong, I. (2024) Safeguarding human values: rethinking US law for generative AI. PMC/NCBI. Available at: https://pmc.ncbi.nlm.nih.gov/articles/PMC12058884/ (Accessed: 31 December 2025).
- ‘Ethics of artificial intelligence’ (2025) Wikipedia. Available at: https://en.wikipedia.org/wiki/Ethics_of_artificial_intelligence (Accessed: 31 December 2025).
