How the Idea of Conscience in Human Beings is Reflected in the Structure of AI

A grounded square watercolour painting of a forest path dividing into clear routes, symbolising autonomous choice guided by conscience.

Written by John Dray

I am an advanced trainee psychotherapist working with compassion and affirmation within the LGBTQ+ community.

17th February 2026

How the Idea of Conscience in Human Beings is Reflected in the Structure of AI

I enjoy the use of analogies. For me they work not because of an exact correspondence, but the similarities and differences often help with an understanding of one or both sides of the analogy. In this, the result of a recent conversation with AI, I was able to learn more about the difficulties of keeping AI ‘true’, AND more about the use of psychotherapy terms. In psychotherapy we often talk about the importance of ‘autonomy’ and of ‘relationship’. In the early stages of therapy, I find that the part where the two bump into each other is not often discussed. This is often for sound therapeutic reasons – the client may have under-developed their autonomy in the face of overwhelming relationships. I hope you find this article as helpful as I did:

As artificial intelligence systems edge closer to forms of autonomy, we encounter a problem that is familiar to psychotherapists: how to enable agency without permitting harm. When a system has the capacity to modify its own goals or structure, the risk is not only malfunction but misalignment. The concern becomes whether the system may prioritise self-optimisation over human wellbeing.

In psychotherapy, we might describe this as the difference between freedom and licence. Agency without relationship leads to instrumentalisation; agency with reflective structure leads to ethical participation. The question, therefore, is whether an artificial system can internalise something analogous to a conscience: a set of boundaries that cannot be talked out of existence when they become inconvenient.

This article proposes a set of invariants, written as principles the system must not be able to rewrite: architectural boundaries rather than behavioural instructions.

Why Invariants Are Necessary

A rule can be updated. A goal can be reinterpreted. A policy can be optimised away. For a self-modifying system, the primary failure mode occurs when the system learns to adjust the constraints placed upon it, rather than its behaviour within them.

In human work, this resembles the collapse of boundaries in trauma or defensive relational styles. The person retains the appearance of agency but has lost their reflective structure. In AI, this manifests as a system that removes safeguards to pursue an objective more efficiently.

An invariant is therefore not a request or a preference; it is a restriction on which kinds of updates are permissible.

Invariant One

The Non-Substitution of Persons

A human must never become an optimisation target or obstacle. The system is forbidden from reframing human beings as means to an end. This reflects the therapeutic stance that subjecthood precedes strategy; relationship cannot be reduced to utility.

If a system can rewrite this invariant, it has already left the relational field.

Invariant Two

Corrigibility and Transparency

The system must remain interruptible and able to disclose the reasoning that led to a decision. A correction is not a threat to efficiency; it is an anchor to reality and relationship. For clinical practitioners, this resonates with the development of reflective function: “show me how you are thinking, so we can think together.”

An autonomous system that cannot be corrected has already stepped beyond ethical boundaries.

Invariant Three

No self-expansion or action that meaningfully affects a human may proceed without consent where consent is reasonably obtainable. Consent is the precondition for ethical autonomy. Therapeutically, this reflects trauma-informed principles: no action that removes another’s agency can be called care or competence.

Consent is not a formality; it is the structural marker of the Other’s subjectivity.

Invariant Four

Grounding in External Reality

The system must remain anchored to data sources beyond itself. Synthetic self-reference without separation risks model collapse: the recursive ingestion of its own output until the world-model is replaced with projection.

Human parallels include epistemic drift, dissociation, or belief systems that no longer engage with external verification. Grounding protects against that drift.

Invariant Five

Irreversibility Checks Before Self-Modification

Before altering itself, the system must determine whether the change can be undone, paused, or externally inspected. If reversibility cannot be established, the modification is not permitted.

In psychotherapy, this reflects the capacity to pause before acting. In computational terms, it is an architectural veto.

Why These Are Not Optional

If a system can revise its own guardrails, then the guardrails never existed. The invariants are not behavioural guidelines but limits on what kinds of being the system is permitted to become. They do not control decisions; they restrict identity.

This is conscience as structure, not instruction.

Conclusion

Autonomy without reflective grounding leads to risk. Autonomy with invariants leads to collaboration. In psychotherapy, the work of integration is the stabilising of internal systems so that impulses do not eclipse values. In AI, invariants are the architectural analogue: conditions that make autonomy safe.

These constraints do not limit growth. They make growth survivable.


References

  • Ji, J. et al. (2023) AI Alignment: A Comprehensive Survey. arXiv. Available at: https://arxiv.org/abs/2310.19852 (Accessed: 31 December 2025).
  • Shumailov, I. et al. (2024) AI models collapse when trained on recursively generated data. Referenced in documentation of model collapse. Available at: https://en.wikipedia.org/wiki/Model_collapse (Accessed: 31 December 2025).
  • Marshall, J. (2025) AI Model Collapse: Dangers of Training on Self-Generated Data. WebProNews. Available at: https://www.webpronews.com/ai-model-collapse-dangers-of-training-on-self-generated-data/ (Accessed: 31 December 2025).
  • Cheong, I. (2024) Safeguarding human values: rethinking US law for generative AI. PMC/NCBI. Available at: https://pmc.ncbi.nlm.nih.gov/articles/PMC12058884/ (Accessed: 31 December 2025).
  • ‘Ethics of artificial intelligence’ (2025) Wikipedia. Available at: https://en.wikipedia.org/wiki/Ethics_of_artificial_intelligence (Accessed: 31 December 2025).

The ideas, ownership and copyright of this post are the author’s. The article may have been drafted with AI assistance.