Docs/SDK/Debugging Agent Failures

Debugging Agent Failures

Updated 2026-01-18 — Complete evidence capture for AI agent failures with the Failure Artifact Buffer.

When agents fail, you need evidence — not guesses. The Failure Artifact Buffer automatically captures everything leading up to a failure: screenshots, snapshots, diagnostics, and optional video clips.

The Problem with Agent Debugging

Traditional debugging approaches fall short for AI agents:

Screenshots at failure time miss the context leading up to the failure
Full recordings are expensive and contain mostly irrelevant data
Log files don't show what the agent actually saw on screen

The Failure Artifact Buffer solves this with a ring buffer that efficiently captures the last N seconds of activity, persisting only when something goes wrong.

Evidence-based debugging

See the 15 seconds leading up to failure, not just the moment of failure. Frames are captured as JPEG to temp storage; only persisted when assertions fail.

What Gets Captured

On assertion failure, the SDK persists a complete artifact bundle:

File	Contents
`manifest.json`	Index with run metadata, status, timestamps, and file list
`snapshot.json`	The browser snapshot at failure time (PII redacted)
`diagnostics.json`	Snapshot confidence scores, DOM metrics, and reason codes
`steps.json`	Timeline of actions and assertions with outcomes
`frames/`	JPEG screenshots from the ring buffer (last 15 seconds)
`failure.mp4`	Optional video clip generated from frames (requires ffmpeg)

Quick Start

Python

from sentience import AgentRuntime
from sentience.failure_artifacts import FailureArtifactBuffer, FailureArtifactsOptions, ClipOptions

# Configure the failure artifact buffer
options = FailureArtifactsOptions(
    buffer_seconds=15,           # Keep last 15 seconds of frames
    capture_on_action=True,      # Capture after every action
    fps=0,                       # Optional: timer-based capture (0 = off)
    persist_mode="on_fail",      # Only persist when assertions fail
    output_dir=".sentience/artifacts",
    clip=ClipOptions(
        mode="auto",             # Generate clip if ffmpeg available
        fps=8,                   # Video framerate
    ),
)

# Enable on your runtime
runtime = AgentRuntime(backend=backend, tracer=tracer)
artifact_buffer = FailureArtifactBuffer(options)
runtime.set_artifact_buffer(artifact_buffer)

# Run your agent... if an assertion fails, artifacts are automatically persisted

Configuration Options

Option	Type	Default	Description
`buffer_seconds` / `bufferSeconds`	`number`	`15`	Duration of frame history to keep
`capture_on_action` / `captureOnAction`	`boolean`	`true`	Capture screenshot after each action
`fps`	`number`	`0`	Timer-based capture rate (0 = disabled)
`persist_mode` / `persistMode`	`string`	`"on_fail"`	When to persist: `"on_fail"` or `"always"`
`output_dir` / `outputDir`	`string`	`".sentience/artifacts"`	Where to write artifact bundles
`redact_snapshot_values` / `redactSnapshotValues`	`boolean`	`true`	Auto-redact password/email/tel input values
`on_before_persist` / `onBeforePersist`	`callback`	`null`	Custom redaction callback
`clip.mode`	`string`	`"auto"`	Video generation: `"off"`, `"auto"`, `"on"`
`clip.fps`	`number`	`8`	Video framerate for clip generation

Automatic PII Redaction

Privacy-safe by default

The SDK includes built-in PII protection that runs automatically before any artifact is written to disk.

What gets redacted by default:

Input fields where input_type is password, email, or tel
The value field is set to null and value_redacted is set to true

This ensures sensitive user input never leaves your machine unless you explicitly disable it.

# Python - Default behavior (redaction ON)
options = FailureArtifactsOptions(
    redact_snapshot_values=True,  # This is the default
)

# To disable default redaction (not recommended):
options = FailureArtifactsOptions(
    redact_snapshot_values=False,
)

Custom Redaction Callback

For advanced use cases, you can provide a custom redaction callback that runs after default redaction.

from sentience.failure_artifacts import (
    FailureArtifactsOptions,
    RedactionContext,
    RedactionResult,
)

def my_custom_redactor(ctx: RedactionContext) -> RedactionResult:
    """
    Custom redaction callback.

    ctx contains:
      - run_id: str - The run identifier
      - reason: str | None - Failure reason (e.g., "assertion_failed")
      - status: "failure" | "success" - Run outcome
      - snapshot: dict | None - The browser snapshot (already default-redacted)
      - diagnostics: dict | None - Snapshot diagnostics
      - frame_paths: list[str] - Paths to captured frame images
      - metadata: dict - Additional metadata from persist() call

    Returns RedactionResult with:
      - snapshot: Modified snapshot (or None to keep original)
      - diagnostics: Modified diagnostics (or None to keep original)
      - frame_paths: Modified frame paths (or None to keep original)
      - drop_frames: If True, don't persist any frames
    """
    # Example: Redact additional fields containing "ssn" or "credit"
    snapshot = ctx.snapshot
    if snapshot and "elements" in snapshot:
        for el in snapshot["elements"]:
            name = (el.get("name") or el.get("id") or "").lower()
            if "ssn" in name or "credit" in name:
                el["value"] = None
                el["value_redacted"] = True

    return RedactionResult(
        snapshot=snapshot,
        drop_frames=False,  # Set True to exclude all frames
    )

options = FailureArtifactsOptions(
    on_before_persist=my_custom_redactor,
)

Callback Use Cases:

Redact company-specific sensitive fields (employee IDs, internal codes)
Drop frames entirely for privacy-critical failures (drop_frames: True)
Add custom metadata to diagnostics
Filter out specific frame files

Cloud Upload

Upload artifact bundles to Sentience cloud storage for team access and long-term retention:

# Python
artifact_index_key = artifact_buffer.upload_to_cloud(
    api_key="sk-...",
    api_url="https://api.sentience.com",  # Optional
    persisted_dir=Path(".sentience/artifacts/run-abc123"),  # Optional: specific run
)

The upload_to_cloud() method:

Requests presigned upload URLs from the gateway (POST /v1/traces/artifacts/init)
Uploads all artifact files directly to object storage
Creates an index.json manifest linking all artifacts
Reports upload stats to the gateway (POST /v1/traces/artifacts/complete)
Returns the artifact_index_key for linking to trace metadata

Artifact Bundle Structure

.sentience/artifacts/
└── run-abc123-def456/
    ├── manifest.json       # Index with metadata
    ├── snapshot.json       # Browser snapshot at failure (redacted)
    ├── diagnostics.json    # Confidence scores, DOM metrics
    ├── steps.json          # Action/assertion timeline
    ├── frames/
    │   ├── frame_001.jpeg  # Screenshots from ring buffer
    │   ├── frame_002.jpeg
    │   └── ...
    └── failure.mp4         # Optional video clip

manifest.json Example

{
"version": 1,
"run_id": "abc123-def456",
"status": "failure",
"started_at": "2026-01-18T10:30:00.000Z",
"ended_at": "2026-01-18T10:30:15.500Z",
"failure_reason": "assertion_failed",
"assertion_label": "Login button should be visible",
"url_at_failure": "https://example.com/login",
"artifacts": [
  { "name": "snapshot.json", "size_bytes": 45678 },
  { "name": "diagnostics.json", "size_bytes": 1234 },
  { "name": "steps.json", "size_bytes": 8901 },
  { "name": "failure.mp4", "size_bytes": 234567 }
],
"frame_count": 45,
"buffer_seconds": 15
}

Viewing Artifacts in Sentience Studio

Uploaded artifacts can be viewed in Sentience Studio for visual debugging:

Frame-by-frame playback of the captured screenshots
Video clips for quick failure triage
Snapshot inspection with element highlighting
Step timeline showing actions and assertion outcomes
Diagnostics overlay with confidence scores

Coming Soon

Deep artifact integration with Studio is in active development. Currently, artifacts can be viewed locally or uploaded for team sharing.

When to Use This Feature

Use Case	Why It Helps
CI/CD pipelines	Automatically capture failure evidence for failed test runs
Production monitoring	Debug agent failures without reproducing the issue
Team collaboration	Share artifact bundles with teammates or attach to bug reports
Compliance	Maintain audit trails of agent actions (with PII redacted)

Dependencies

Dependency	Required?	Purpose
Core SDK	Required	Basic functionality works without additional packages
`pillow` (Python) / `canvas` (TS)	Optional	Frame redaction (blurring sensitive areas)
`ffmpeg`	Optional	Video clip generation (must be on PATH)

Agent Runtime — The runtime that triggers artifact capture on assertion failures
Jest-Style Assertions — Assertion DSL that integrates with artifact capture
Sentience Studio — Visual debugging platform for viewing artifacts
Tracing & Debugging — Trace file format and debugging workflows

Agent Runtime

Sentience Studio