State Hijacking in Multi-Agent Systems: Securing LangGraph Context

Security invariant

Agent output is a proposed patch, never trusted state

Every node receives the minimum view it needs and may propose only an explicitly authorized set of fields. A trusted control plane validates, applies, records, and—when risk demands it—pauses that transition before another node can act.

01 // The stateful agent threat model

One context object crosses several privilege zones

Multi-agent graphs divide work into nodes: an input parser handles hostile text, a planner selects an operation, a retrieval node reads internal data, and a database writer performs durable changes. Those nodes do not share equal trust. They do, however, communicate through graph state—a logical context whose fields move from one step to the next.

In LangGraph, the state schema defines channels and reducers define how node updates are applied. Without a custom reducer, a new value replaces the old one. Nodes can return partial updates rather than the entire state. Checkpointers can persist snapshots by thread so long-running work survives failures and interruptions. In distributed deployments, those updates may also cross queues, workers, or service boundaries.

UntrustedInput parser

Receives raw user-controlled text. No database credentials.

Security boundaryGraph state

Carries data, authority references, transition history, and routing decisions.

PrivilegedDatabase writer

Can make durable changes and must consume only authorized state.

The dangerous assumption is that state is trustworthy because it is “internal.” Its provenance is mixed: some fields came from an authenticated control plane, while others came from an LLM interpreting attacker input. If both become ordinary dictionary keys, the state object becomes a privilege-escalation channel.

02 // State injection

The breach is one unchecked dictionary merge

Imagine a parser asked to return structured JSON. An attacker embeds instructions that cause it to include is_admin: true and replace target_user_id. A prompt-injection filter may inspect the original text, yet the final JSON still looks syntactically valid. If the orchestrator treats that output as an unrestricted patch, the low-trust node has written high-trust policy.

unsafe_transition.pyDo not merge agent output globally

def input_parser(state: dict) -> dict:
    agent_output = parser_llm.invoke(state["raw_text"])
    # Attacker-controlled output may contain protected keys.
    state.update(agent_output)
    return state

# Later, in a privileged node:
def database_writer(state: dict):
    if state["execution_permissions"]["is_admin"]:
        db.update_user(
            user_id=state["target_user_id"],
            changes=state["requested_changes"],
        )

The writer does exactly what its code says. The failure happened earlier, when the parser crossed a write boundary it never should have possessed. This bypasses prompt-only defenses because the exploit is now application data flowing through an authorized graph edge. It also survives separation into microservices if every service trusts the same unsigned, unscoped state payload.

03 // Node-scoped schemas

Give every node a capability-shaped view

A graph-wide TypedDict improves developer tooling but does not create runtime authorization. Define separate Pydantic models at every trust boundary. Use extra="forbid" so unexpected keys fail validation and strict mode where coercion could hide an invalid value. Protected authority should be frozen and supplied by trusted application code, never regenerated by an agent.

secure_nodes.pyPydantic v2 boundary models

from pydantic import BaseModel, ConfigDict, Field
from typing import Literal
from uuid import UUID

class ParserInput(BaseModel):
    model_config = ConfigDict(extra="forbid", strict=True, frozen=True)
    raw_text: str = Field(max_length=20_000)

class ParserPatch(BaseModel):
    model_config = ConfigDict(extra="forbid", strict=True, frozen=True)
    raw_text: str = Field(max_length=20_000)

class ExecutionPermissions(BaseModel):
    model_config = ConfigDict(extra="forbid", strict=True, frozen=True)
    write_scope: Literal["none", "self", "tenant_admin"]
    authenticated_user_id: UUID

def parser_node(state):
    visible = ParserInput.model_validate({"raw_text": state["raw_text"]})

    # The worker receives only ParserInput—not the complete graph state.
    untrusted = parser_worker.invoke(visible.model_dump())
    patch = ParserPatch.model_validate(untrusted, strict=True)

    # Impossible to return execution_permissions or target_user_id:
    # extra fields raise ValidationError before LangGraph sees the update.
    return patch.model_dump()

Apply the same rule to reads. Construct the node's input from an allowlist rather than passing the complete state into an untrusted worker. Give the parser no database credentials and no signing key. Schemas constrain data; process isolation and credential scoping constrain code. You need both if a worker itself may be compromised.

input_parserreadsraw_textwritesraw_text

plannerreadsraw_text, catalogwritesrequested_action

approval_gatereadspending_transitionwritesapproval_receipt

database_writerreadsapproved_action, authoritywritesresult_ref

04 // Immutable state snapshotting

Turn every accepted transition into an auditable commit

LangGraph checkpoints provide durable history, replay, and recovery. Add integrity and authorization at the application layer: only a trusted transition service may accept a node patch, and every accepted state receives a content digest chained to its parent. Sign that digest with a key unavailable to agent workers.

state_ledger.pyContent-addressed transition guard

import hashlib, hmac, json

ALLOWED_WRITES = {
    "input_parser": {"raw_text"},
    "planner": {"requested_action"},
    "approval_gate": {"approval_receipt"},
    "database_writer": {"result_ref"},
}

def commit_transition(node, previous, raw_patch, signing_key, store):
    illegal = set(raw_patch) - ALLOWED_WRITES[node]
    if illegal:
        raise PermissionError(f"{node} cannot write {sorted(illegal)}")

    patch = OUTPUT_MODELS[node].model_validate(raw_patch, strict=True)
    candidate = previous["state"] | patch.model_dump(mode="json")
    payload = json.dumps(candidate, sort_keys=True, separators=(",", ":"))
    digest = hashlib.sha256(
        (previous["digest"] + payload).encode()
    ).hexdigest()
    signature = hmac.new(signing_key, digest.encode(), "sha256").hexdigest()

    snapshot = {
        "state": candidate,
        "digest": digest,
        "parent": previous["digest"],
        "node": node,
        "signature": signature,
    }
    store.put_if_absent(digest, snapshot)
    return snapshot

This resembles a Git commit for data: the digest identifies the content and links it to a parent. Before a privileged node runs, verify the signature and confirm that its checkpoint descends from the last trusted state. An illegal patch is rejected before commit; a later anomaly can restore the parent snapshot and replay from the last safe checkpoint.

Signing alone does not make a transition legitimate. It proves that the transition service recorded it. The allowlist, schemas, authenticated node identity, monotonic version, and compare-and-swap write prevent stale workers or parallel branches from authorizing each other. Store rejected attempts in a separate audit log without letting attacker content become executable state.

05 // Human-in-the-loop state gates

Bind approval to the exact risky transition

Some changes should pause even when correctly typed: a new target user, a broader write scope, destructive database action, or an authority change. LangGraph's interrupt() persists the thread through a checkpointer and surfaces a JSON-serializable request. The graph resumes with Command(resume=...) using the same thread_id.

approval_gate.pyDigest-bound review

from pydantic import BaseModel, ConfigDict
from langgraph.types import Command, interrupt

class Approval(BaseModel):
    model_config = ConfigDict(extra="forbid", strict=True)
    approved: bool
    transition_digest: str
    reviewer_id: str

def approval_gate(state):
    tx = state["pending_transition"]
    risky = {"target_user_id", "execution_permissions", "write_scope"}

    if not (set(tx["changed_keys"]) & risky):
        return Command(goto="database_writer")

    answer = Approval.model_validate(interrupt({
        "kind": "state_transition_review",
        "transition_digest": tx["digest"],
        "changed_keys": tx["changed_keys"],
        "redacted_diff": tx["redacted_diff"],
    }), strict=True)

    if not answer.approved or answer.transition_digest != tx["digest"]:
        return Command(goto="cancel")

    receipt = sign_approval(answer.model_dump())
    return Command(update={"approval_receipt": receipt},
                   goto="database_writer")

The application hosting the graph should detect the interrupt and send an idempotent webhook to an authenticated review service. The reviewer sees a redacted diff and the transition digest; the resume payload must contain that same digest, reviewer identity, decision, expiry, and nonce. A stale approval must not authorize a newer state.

LangGraph restarts the interrupted node from its beginning when resumed, so code before interrupt() may execute again. Keep outbound notifications outside the node or make them idempotent. Most importantly, the database writer must independently verify the signed state chain and approval receipt immediately before the write. A routing edge is not proof of authorization.

06 // Zero-trust state machine

Make privilege escalation structurally impossible

Minimize reads

Project only the fields a node needs; never serialize the complete context to an untrusted worker.

Authorize writes

Validate node identity, key allowlists, Pydantic output schemas, and semantic invariants centrally.

Commit transitions

Chain signed content digests, enforce versions, and retain a safe parent for replay or rollback.

Gate dangerous diffs

Pause on protected-key changes and bind human approval to one immutable transition digest.

Test the negative paths: make the parser return protected keys, wrong types, oversized strings, stale versions, forged signatures, mismatched approval digests, and parallel updates to the same thread. The database write must remain unreachable in every case.

Multi-agent systems do not fail only when a model follows a hostile prompt. They fail when software gives persuasive output authority it never earned. Treat graph state as a security-critical protocol: narrow inputs, capability-scoped patches, policy-owned transitions, immutable history, and explicit approval before privilege crosses a boundary.

Sources // official documentation