Docs/SDK/Debugging Agent Failures

Debugging Agent Failures

Updated 2026-01-18 — Complete evidence capture for AI agent failures with the Failure Artifact Buffer.

When agents fail, you need evidence — not guesses. The Failure Artifact Buffer automatically captures everything leading up to a failure: screenshots, snapshots, diagnostics, and optional video clips.


The Problem with Agent Debugging

Traditional debugging approaches fall short for AI agents:

The Failure Artifact Buffer solves this with a ring buffer that efficiently captures the last N seconds of activity, persisting only when something goes wrong.

Evidence-based debugging

See the 15 seconds leading up to failure, not just the moment of failure. Frames are captured as JPEG to temp storage; only persisted when assertions fail.


What Gets Captured

On assertion failure, the SDK persists a complete artifact bundle:

FileContents
manifest.jsonIndex with run metadata, status, timestamps, and file list
snapshot.jsonThe browser snapshot at failure time (PII redacted)
diagnostics.jsonSnapshot confidence scores, DOM metrics, and reason codes
steps.jsonTimeline of actions and assertions with outcomes
frames/JPEG screenshots from the ring buffer (last 15 seconds)
failure.mp4Optional video clip generated from frames (requires ffmpeg)

Quick Start

Python

from sentience import AgentRuntime
from sentience.failure_artifacts import FailureArtifactBuffer, FailureArtifactsOptions, ClipOptions

# Configure the failure artifact buffer
options = FailureArtifactsOptions(
    buffer_seconds=15,           # Keep last 15 seconds of frames
    capture_on_action=True,      # Capture after every action
    fps=0,                       # Optional: timer-based capture (0 = off)
    persist_mode="on_fail",      # Only persist when assertions fail
    output_dir=".sentience/artifacts",
    clip=ClipOptions(
        mode="auto",             # Generate clip if ffmpeg available
        fps=8,                   # Video framerate
    ),
)

# Enable on your runtime
runtime = AgentRuntime(backend=backend, tracer=tracer)
artifact_buffer = FailureArtifactBuffer(options)
runtime.set_artifact_buffer(artifact_buffer)

# Run your agent... if an assertion fails, artifacts are automatically persisted

Configuration Options

OptionTypeDefaultDescription
buffer_seconds / bufferSecondsnumber15Duration of frame history to keep
capture_on_action / captureOnActionbooleantrueCapture screenshot after each action
fpsnumber0Timer-based capture rate (0 = disabled)
persist_mode / persistModestring"on_fail"When to persist: "on_fail" or "always"
output_dir / outputDirstring".sentience/artifacts"Where to write artifact bundles
redact_snapshot_values / redactSnapshotValuesbooleantrueAuto-redact password/email/tel input values
on_before_persist / onBeforePersistcallbacknullCustom redaction callback
clip.modestring"auto"Video generation: "off", "auto", "on"
clip.fpsnumber8Video framerate for clip generation

Automatic PII Redaction

Privacy-safe by default

The SDK includes built-in PII protection that runs automatically before any artifact is written to disk.

What gets redacted by default:

This ensures sensitive user input never leaves your machine unless you explicitly disable it.

# Python - Default behavior (redaction ON)
options = FailureArtifactsOptions(
    redact_snapshot_values=True,  # This is the default
)

# To disable default redaction (not recommended):
options = FailureArtifactsOptions(
    redact_snapshot_values=False,
)

Custom Redaction Callback

For advanced use cases, you can provide a custom redaction callback that runs after default redaction.

from sentience.failure_artifacts import (
    FailureArtifactsOptions,
    RedactionContext,
    RedactionResult,
)

def my_custom_redactor(ctx: RedactionContext) -> RedactionResult:
    """
    Custom redaction callback.

    ctx contains:
      - run_id: str - The run identifier
      - reason: str | None - Failure reason (e.g., "assertion_failed")
      - status: "failure" | "success" - Run outcome
      - snapshot: dict | None - The browser snapshot (already default-redacted)
      - diagnostics: dict | None - Snapshot diagnostics
      - frame_paths: list[str] - Paths to captured frame images
      - metadata: dict - Additional metadata from persist() call

    Returns RedactionResult with:
      - snapshot: Modified snapshot (or None to keep original)
      - diagnostics: Modified diagnostics (or None to keep original)
      - frame_paths: Modified frame paths (or None to keep original)
      - drop_frames: If True, don't persist any frames
    """
    # Example: Redact additional fields containing "ssn" or "credit"
    snapshot = ctx.snapshot
    if snapshot and "elements" in snapshot:
        for el in snapshot["elements"]:
            name = (el.get("name") or el.get("id") or "").lower()
            if "ssn" in name or "credit" in name:
                el["value"] = None
                el["value_redacted"] = True

    return RedactionResult(
        snapshot=snapshot,
        drop_frames=False,  # Set True to exclude all frames
    )

options = FailureArtifactsOptions(
    on_before_persist=my_custom_redactor,
)

Callback Use Cases:


Cloud Upload

Upload artifact bundles to Sentience cloud storage for team access and long-term retention:

# Python
artifact_index_key = artifact_buffer.upload_to_cloud(
    api_key="sk-...",
    api_url="https://api.sentience.com",  # Optional
    persisted_dir=Path(".sentience/artifacts/run-abc123"),  # Optional: specific run
)

The upload_to_cloud() method:

  1. Requests presigned upload URLs from the gateway (POST /v1/traces/artifacts/init)
  2. Uploads all artifact files directly to object storage
  3. Creates an index.json manifest linking all artifacts
  4. Reports upload stats to the gateway (POST /v1/traces/artifacts/complete)
  5. Returns the artifact_index_key for linking to trace metadata

Artifact Bundle Structure

.sentience/artifacts/
└── run-abc123-def456/
    ├── manifest.json       # Index with metadata
    ├── snapshot.json       # Browser snapshot at failure (redacted)
    ├── diagnostics.json    # Confidence scores, DOM metrics
    ├── steps.json          # Action/assertion timeline
    ├── frames/
    │   ├── frame_001.jpeg  # Screenshots from ring buffer
    │   ├── frame_002.jpeg
    │   └── ...
    └── failure.mp4         # Optional video clip

manifest.json Example

{
"version": 1,
"run_id": "abc123-def456",
"status": "failure",
"started_at": "2026-01-18T10:30:00.000Z",
"ended_at": "2026-01-18T10:30:15.500Z",
"failure_reason": "assertion_failed",
"assertion_label": "Login button should be visible",
"url_at_failure": "https://example.com/login",
"artifacts": [
  { "name": "snapshot.json", "size_bytes": 45678 },
  { "name": "diagnostics.json", "size_bytes": 1234 },
  { "name": "steps.json", "size_bytes": 8901 },
  { "name": "failure.mp4", "size_bytes": 234567 }
],
"frame_count": 45,
"buffer_seconds": 15
}

Viewing Artifacts in Sentience Studio

Uploaded artifacts can be viewed in Sentience Studio for visual debugging:

Coming Soon

Deep artifact integration with Studio is in active development. Currently, artifacts can be viewed locally or uploaded for team sharing.


When to Use This Feature

Use CaseWhy It Helps
CI/CD pipelinesAutomatically capture failure evidence for failed test runs
Production monitoringDebug agent failures without reproducing the issue
Team collaborationShare artifact bundles with teammates or attach to bug reports
ComplianceMaintain audit trails of agent actions (with PII redacted)

Dependencies

DependencyRequired?Purpose
Core SDKRequiredBasic functionality works without additional packages
pillow (Python) / canvas (TS)OptionalFrame redaction (blurring sensitive areas)
ffmpegOptionalVideo clip generation (must be on PATH)