Essay roadmap

Planned Essays

Each essay starts with my claim, then builds the path from intuition to mechanism, failure modes, and design implications. References are evidence, not the voice of the piece.

01 / Manifesto

The Agent Boundary

The security boundary of an AI agent is not the model. It is the point where natural language gains authority over tools, identity, memory, and external state.

System model: User intent, model, agent loop, tools, memory, and state.
Failure modes: Mixed-trust instructions, overbroad tools, poisoned memory, and misplaced human trust.
Design stance: Agent design should be capability design.

02 / Security

Prompt Injection Is a Trust Boundary Failure

Prompt injection is poorly framed as prompt engineering. It is a mixed-trust input problem that becomes dangerous when the model can call tools or modify state.

Mechanism: The context window collapses system instructions, user goals, retrieved data, tool output, and memory.
Failure modes: Retrieved documents override intent; tool output smuggles instructions; poisoned memory persists influence.
Design stance: Label trust zones and bind high-impact actions to explicit policy.

03 / Foundations

Why Self-Attention Replaced RNNs

Self-attention displaced recurrence because it changed sequence modeling from step-by-step state passing into parallel, content-addressed interaction between tokens.

Intuition: An RNN whispers a summary forward; a transformer lets every token inspect every other token.
Mechanism: Queries, keys, values, attention scores, positional information, and representation mixing.
Design stance: The strength of pairwise interaction creates the scaling pressure behind long-context and retrieval systems.

Read draft essay

04 / State

Memory Poisoning Is Persistent State Corruption

Agent memory should be analyzed as mutable application state, not as harmless conversation history.

System model: Short-term context, long-term memory, shared memory, summaries, and provenance.
Failure modes: Attacker preferences, poisoned summaries, cross-session leakage, and shared workflow corruption.
Design stance: Memory writes need integrity checks, snapshots, rollback, and access control.