AI security // state integrity
State Hijacking in Multi-Agent Systems
A low-privilege agent should not be able to promote itself by writing one convincing key into the context consumed by a database writer.
Security invariant
Agent output is a proposed patch, never trusted state
Every node receives the minimum view it needs and may propose only an explicitly authorized set of fields. A trusted control plane validates, applies, records, and—when risk demands it—pauses that transition before another node can act.
01 // The stateful agent threat model
One context object crosses several privilege zones
Multi-agent graphs divide work into nodes: an input parser handles hostile text, a planner selects an operation, a retrieval node reads internal data, and a database writer performs durable changes. Those nodes do not share equal trust. They do, however, communicate through graph state—a logical context whose fields move from one step to the next.
In LangGraph, the state schema defines channels and reducers define how node updates are applied. Without a custom reducer, a new value replaces the old one. Nodes can return partial updates rather than the entire state. Checkpointers can persist snapshots by thread so long-running work survives failures and interruptions. In distributed deployments, those updates may also cross queues, workers, or service boundaries.
Receives raw user-controlled text. No database credentials.
Can make durable changes and must consume only authorized state.
The dangerous assumption is that state is trustworthy because it is “internal.” Its provenance is mixed: some fields came from an authenticated control plane, while others came from an LLM interpreting attacker input. If both become ordinary dictionary keys, the state object becomes a privilege-escalation channel.
02 // State injection
The breach is one unchecked dictionary merge
Imagine a parser asked to return structured JSON. An attacker
embeds instructions that cause it to include
is_admin: true and replace
target_user_id. A prompt-injection filter may inspect
the original text, yet the final JSON still looks syntactically
valid. If the orchestrator treats that output as an unrestricted
patch, the low-trust node has written high-trust policy.
def input_parser(state: dict) -> dict:
agent_output = parser_llm.invoke(state["raw_text"])
# Attacker-controlled output may contain protected keys.
state.update(agent_output)
return state
# Later, in a privileged node:
def database_writer(state: dict):
if state["execution_permissions"]["is_admin"]:
db.update_user(
user_id=state["target_user_id"],
changes=state["requested_changes"],
)
The writer does exactly what its code says. The failure happened earlier, when the parser crossed a write boundary it never should have possessed. This bypasses prompt-only defenses because the exploit is now application data flowing through an authorized graph edge. It also survives separation into microservices if every service trusts the same unsigned, unscoped state payload.
03 // Node-scoped schemas
Give every node a capability-shaped view
A graph-wide TypedDict improves developer tooling but
does not create runtime authorization. Define separate Pydantic
models at every trust boundary. Use extra="forbid" so
unexpected keys fail validation and strict mode where coercion
could hide an invalid value. Protected authority should be frozen
and supplied by trusted application code, never regenerated by an
agent.
from pydantic import BaseModel, ConfigDict, Field
from typing import Literal
from uuid import UUID
class ParserInput(BaseModel):
model_config = ConfigDict(extra="forbid", strict=True, frozen=True)
raw_text: str = Field(max_length=20_000)
class ParserPatch(BaseModel):
model_config = ConfigDict(extra="forbid", strict=True, frozen=True)
raw_text: str = Field(max_length=20_000)
class ExecutionPermissions(BaseModel):
model_config = ConfigDict(extra="forbid", strict=True, frozen=True)
write_scope: Literal["none", "self", "tenant_admin"]
authenticated_user_id: UUID
def parser_node(state):
visible = ParserInput.model_validate({"raw_text": state["raw_text"]})
# The worker receives only ParserInput—not the complete graph state.
untrusted = parser_worker.invoke(visible.model_dump())
patch = ParserPatch.model_validate(untrusted, strict=True)
# Impossible to return execution_permissions or target_user_id:
# extra fields raise ValidationError before LangGraph sees the update.
return patch.model_dump()
Apply the same rule to reads. Construct the node's input from an allowlist rather than passing the complete state into an untrusted worker. Give the parser no database credentials and no signing key. Schemas constrain data; process isolation and credential scoping constrain code. You need both if a worker itself may be compromised.
raw_textwritesraw_textraw_text, catalogwritesrequested_actionpending_transitionwritesapproval_receiptapproved_action, authoritywritesresult_ref04 // Immutable state snapshotting
Turn every accepted transition into an auditable commit
LangGraph checkpoints provide durable history, replay, and recovery. Add integrity and authorization at the application layer: only a trusted transition service may accept a node patch, and every accepted state receives a content digest chained to its parent. Sign that digest with a key unavailable to agent workers.
import hashlib, hmac, json
ALLOWED_WRITES = {
"input_parser": {"raw_text"},
"planner": {"requested_action"},
"approval_gate": {"approval_receipt"},
"database_writer": {"result_ref"},
}
def commit_transition(node, previous, raw_patch, signing_key, store):
illegal = set(raw_patch) - ALLOWED_WRITES[node]
if illegal:
raise PermissionError(f"{node} cannot write {sorted(illegal)}")
patch = OUTPUT_MODELS[node].model_validate(raw_patch, strict=True)
candidate = previous["state"] | patch.model_dump(mode="json")
payload = json.dumps(candidate, sort_keys=True, separators=(",", ":"))
digest = hashlib.sha256(
(previous["digest"] + payload).encode()
).hexdigest()
signature = hmac.new(signing_key, digest.encode(), "sha256").hexdigest()
snapshot = {
"state": candidate,
"digest": digest,
"parent": previous["digest"],
"node": node,
"signature": signature,
}
store.put_if_absent(digest, snapshot)
return snapshot
This resembles a Git commit for data: the digest identifies the content and links it to a parent. Before a privileged node runs, verify the signature and confirm that its checkpoint descends from the last trusted state. An illegal patch is rejected before commit; a later anomaly can restore the parent snapshot and replay from the last safe checkpoint.
Signing alone does not make a transition legitimate. It proves that the transition service recorded it. The allowlist, schemas, authenticated node identity, monotonic version, and compare-and-swap write prevent stale workers or parallel branches from authorizing each other. Store rejected attempts in a separate audit log without letting attacker content become executable state.
05 // Human-in-the-loop state gates
Bind approval to the exact risky transition
Some changes should pause even when correctly typed: a new target
user, a broader write scope, destructive database action, or an
authority change. LangGraph's interrupt() persists the
thread through a checkpointer and surfaces a JSON-serializable
request. The graph resumes with Command(resume=...)
using the same thread_id.
from pydantic import BaseModel, ConfigDict
from langgraph.types import Command, interrupt
class Approval(BaseModel):
model_config = ConfigDict(extra="forbid", strict=True)
approved: bool
transition_digest: str
reviewer_id: str
def approval_gate(state):
tx = state["pending_transition"]
risky = {"target_user_id", "execution_permissions", "write_scope"}
if not (set(tx["changed_keys"]) & risky):
return Command(goto="database_writer")
answer = Approval.model_validate(interrupt({
"kind": "state_transition_review",
"transition_digest": tx["digest"],
"changed_keys": tx["changed_keys"],
"redacted_diff": tx["redacted_diff"],
}), strict=True)
if not answer.approved or answer.transition_digest != tx["digest"]:
return Command(goto="cancel")
receipt = sign_approval(answer.model_dump())
return Command(update={"approval_receipt": receipt},
goto="database_writer")
The application hosting the graph should detect the interrupt and send an idempotent webhook to an authenticated review service. The reviewer sees a redacted diff and the transition digest; the resume payload must contain that same digest, reviewer identity, decision, expiry, and nonce. A stale approval must not authorize a newer state.
LangGraph restarts the interrupted node from its beginning when
resumed, so code before interrupt() may execute again.
Keep outbound notifications outside the node or make them
idempotent. Most importantly, the database writer must independently
verify the signed state chain and approval receipt immediately
before the write. A routing edge is not proof of authorization.
06 // Zero-trust state machine
Make privilege escalation structurally impossible
Minimize reads
Project only the fields a node needs; never serialize the complete context to an untrusted worker.
Authorize writes
Validate node identity, key allowlists, Pydantic output schemas, and semantic invariants centrally.
Commit transitions
Chain signed content digests, enforce versions, and retain a safe parent for replay or rollback.
Gate dangerous diffs
Pause on protected-key changes and bind human approval to one immutable transition digest.
Test the negative paths: make the parser return protected keys, wrong types, oversized strings, stale versions, forged signatures, mismatched approval digests, and parallel updates to the same thread. The database write must remain unreachable in every case.
Multi-agent systems do not fail only when a model follows a hostile prompt. They fail when software gives persuasive output authority it never earned. Treat graph state as a security-critical protocol: narrow inputs, capability-scoped patches, policy-owned transitions, immutable history, and explicit approval before privilege crosses a boundary.
Sources // official documentation