How SentienceAPI Works

From raw webpages to deterministic agent actions — with full observability.

Most tools help agents load pages or read content.
SentienceAPI helps agents act — reliably.

The Problem: In practice, agents fail far more often at execution than at reasoning

Modern LLMs are good at deciding what they want to do.
They are unreliable at deciding where to do it on a real webpage.

Clicking invisible or occluded elements
Guessing between similar buttons
Retrying with different prompts
Burning tokens on vision loops
Non-reproducible behavior across runs

Sentience fixes this by removing guesswork from action selection.

Where SentienceAPI Fits

SentienceAPI sits between the browser and your LLM.
It converts live webpages into action-ready targets that agents can select deterministically — without brittle selectors or probabilistic vision.

Your Agent

(LLM + Logic)

Defines the goal

Natural language intent

natural language goal

Sentience SDK

(Python / TS)

Launches browser
Triggers snapshot()
Executes actions
raw geometry + DOM

Browser + WASM Ext

Captures layout
Visibility & occlusion
Stable coordinates
raw geometry data

Sentience Gateway

(Cloud)

Semantic grounding
Heuristics + ML rerank
Action-ready elements
ranked, stable targets

Deterministic Actions

Click / Type / Scroll
Verifiable results
Replayable steps

Execution Traces & Studio

Every snapshot and action is recorded
Replay, diff, and debug any agent run
Power CI-style validation

How It Works (One Pass)

1

Your Agent Defines the Goal

Your agent (LLM + logic) decides what it wants to do:

  • "Click the search input"
  • "Add the item to cart"
  • "Submit the form"

Sentience does not replace planning or reasoning.

2

The Sentience SDK Controls the Browser

Using the Sentience SDK (Python or TypeScript), your agent:

  • launches a real browser
  • navigates pages
  • requests a snapshot()

This snapshot is not HTML or pixels — it's structured geometry with semantic rankings.

3

Browser + WASM Capture Raw Geometry

A lightweight browser extension captures:

  • element bounding boxes (x, y, w, h)
  • visibility and occlusion
  • layout structure
  • stable coordinates

No inference. No guessing. Just ground truth.

4

The Sentience Gateway Grounds Actions

Raw geometry is sent to the Sentience Gateway, which:

  • assigns semantic roles (button, input, link, etc.)
  • computes visibility & occlusion explicitly
  • applies heuristics and ML reranking
  • produces a ranked, execution-ready action space

This is the intelligence layer.

5

Deterministic Actions Are Executed

Your agent selects a target from the grounded action space:

  • click
  • type
  • scroll
  • wait

The SDK executes the action exactly where intended, with no retries in most cases.

Want to see this in action?

Run a live example using the Sentience SDK — no setup required.
👉Try it live

What Makes This Different

Sentience vs Browser Infrastructure

Browser infrastructure gives you a place to run code.

Sentience gives your agent certainty about where to act.

Without grounded action selection, agents still guess.

Sentience vs Scrapers / Read APIs

Scrapers are excellent for:

  • reading
  • summarizing
  • RAG

They do not tell an agent:

  • • what is clickable
  • • what is visible
  • • where it is on screen

Reading ≠ acting.

Sentience is built for agents that must interact.

What the Agent Actually Receives

Instead of pixels or raw DOM, your agent gets:

Ranked list of actionable elements
Ordered by relevance and visibility
Stable geometry
Consistent coordinates and bounding boxes
Explicit signals
Visibility, occlusion, primary action cues
Deterministic ordering
Same input produces same output every time

This drastically reduces:

Token usage
Retries
Hallucinated actions

Built-In Observability (Sentience Studio & Traces)

Every step is recorded automatically:

snapshots
ranked targets
chosen action
execution result

These traces power:

step-by-step replay
visual debugging
determinism diffing
CI-style validation

Nothing is hidden. Nothing is guessed.

When You Should Use Sentience

Sentience is designed for:

  • Agents that must act, not just read
  • Production workflows where retries are expensive
  • Systems that need auditability and replay
  • Teams debugging real-world agent failures

If your agent only reads text, Sentience is unnecessary.

If your agent must click, type, scroll, or submit — Sentience is the missing layer.

Try It Live

If you're building agents that must act, SentienceAPI is the missing layer.

Explore interactive SDK examples or test the API directly with real automation scenarios

Navigate to a login page, find email/password fields semantically, and submit the form.

1# No selectors. No vision. Stable semantic targets.
2from sentience import SentienceBrowser, snapshot, find, click, type_text, wait_for
3
4# Initialize browser with API key
5browser = SentienceBrowser(api_key="sk_live_...")
6browser.start()
7
8# Navigate to login page
9browser.page.goto("https://example.com/login")
10
11# PERCEPTION: Find elements semantically
12snap = snapshot(browser)
13email_field = find(snap, "role=textbox text~'email'")
14password_field = find(snap, "role=textbox text~'password'")
15submit_btn = find(snap, "role=button text~'sign in'")
16
17# ACTION: Interact with the page
18type_text(browser, email_field.id, "user@example.com")
19type_text(browser, password_field.id, "secure_password")
20click(browser, submit_btn.id)
21
22# VERIFICATION: Wait for navigation
23wait_for(browser, "role=heading text~'Dashboard'", timeout=5.0)
24
25print("✅ Login successful!")
26browser.close()

🎯 Semantic Discovery

Find elements by role, text, and visual cues - not fragile CSS selectors

⚡ Token Optimization

Intelligent filtering reduces token usage by up to 73% vs vision models

🔒 Deterministic

Same input produces same output every time - no random failures

SentienceAPI focuses on execution intelligence. Browser runtimes and navigation engines are intentionally decoupled.