Essay roadmap

Planned Essays

Each essay starts with my claim, then builds the path from intuition to mechanism, failure modes, and design implications. References are evidence, not the voice of the piece.

01 / Manifesto

The Agent Boundary

The security boundary of an AI agent is not the model. It is the point where natural language gains authority over tools, identity, memory, and external state.

System model
User intent, model, agent loop, tools, memory, and state.
Failure modes
Mixed-trust instructions, overbroad tools, poisoned memory, and misplaced human trust.
Design stance
Agent design should be capability design.
02 / Security

Prompt Injection Is a Trust Boundary Failure

Prompt injection is poorly framed as prompt engineering. It is a mixed-trust input problem that becomes dangerous when the model can call tools or modify state.

Mechanism
The context window collapses system instructions, user goals, retrieved data, tool output, and memory.
Failure modes
Retrieved documents override intent; tool output smuggles instructions; poisoned memory persists influence.
Design stance
Label trust zones and bind high-impact actions to explicit policy.
03 / Foundations

Why Self-Attention Replaced RNNs

Self-attention displaced recurrence because it changed sequence modeling from step-by-step state passing into parallel, content-addressed interaction between tokens.

Intuition
An RNN whispers a summary forward; a transformer lets every token inspect every other token.
Mechanism
Queries, keys, values, attention scores, positional information, and representation mixing.
Design stance
The strength of pairwise interaction creates the scaling pressure behind long-context and retrieval systems.

Read draft essay

04 / State

Memory Poisoning Is Persistent State Corruption

Agent memory should be analyzed as mutable application state, not as harmless conversation history.

System model
Short-term context, long-term memory, shared memory, summaries, and provenance.
Failure modes
Attacker preferences, poisoned summaries, cross-session leakage, and shared workflow corruption.
Design stance
Memory writes need integrity checks, snapshots, rollback, and access control.