Docs/SDK/Agent Quick Start

Agent Quick Start

NEW in v0.3.0+ - The Agent Abstraction Layer provides natural language automation with 4 levels of control, from simple commands to full conversational AI.

Overview

The Sentience SDK offers multiple levels of abstraction for browser automation:

Level	Use Case	Code Reduction	Requirements
Level 1: Raw Playwright	Maximum control, edge cases	0%	LLM API key
Level 2: Direct SDK	Precise control, debugging	80%	Sentience API key
Level 3: SentienceAgent	Quick automation, step-by-step	95%	LLM API key
Level 4: ConversationalAgent	Complex tasks, chatbots	99%	LLM API key

Quick Tip: Start with Level 3 (SentienceAgent) for most automation tasks. Upgrade to Level 4 (ConversationalAgent) when you need multi-step planning or conversational interfaces.

Level 1: Raw Playwright - Maximum Control

Use Playwright directly with semantic element finding - no LLM required:

from playwright.sync_api import sync_playwright

# Pure Playwright - no Sentience SDK
with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    page = browser.new_page()

    # Navigate
    page.goto("https://amazon.com")

    # Find elements with CSS selectors
    page.locator('input[id="twotabsearchtextbox"]').fill("wireless mouse")
    page.locator('input[id="nav-search-submit-button"]').click()
    page.wait_for_selector('.s-result-item')

    # Click first result
    page.locator('.s-result-item').first.click()

    browser.close()

When to use Level 1:

Maximum control over every action
No external API dependencies
Debugging complex edge cases
Building reusable libraries

Limitations:

Requires brittle CSS selectors (breaks when HTML changes)
No semantic understanding of page elements
Manual waiting and error handling
More code to write and maintain

Level 2: Direct SDK - Semantic Queries

Use Sentience SDK for semantic element finding without LLMs:

from sentience import SentienceBrowser, snapshot, find, click, type_text, press

# Sentience SDK - semantic queries, no LLM
with SentienceBrowser(api_key="your_key") as browser:
    browser.page.goto("https://amazon.com")

    # Semantic element finding (no CSS selectors!)
    snap = snapshot(browser)
    search_box = find(snap, "role=textbox text~'search'")
    type_text(browser, search_box.id, "wireless mouse")
    press(browser, "Enter")

    # Wait for results
    snap = snapshot(browser)
    first_result = find(snap, "role=link importance>500")
    click(browser, first_result.id)

Benefits over Level 1:

Semantic queries instead of CSS selectors (80% less code)
Importance-based element ranking
Built-in waiting and error handling
Works across different page layouts

When to use Level 2:

Precise control over each action
Performance-critical applications (minimize LLM calls)
Debugging specific element interactions
Building reusable automation libraries

Level 3: SentienceAgent - Natural Language Commands (Recommended)

Use single natural language commands - the agent handles the rest:

from sentience import SentienceBrowser, SentienceAgent
from sentience.llm import OpenAIProvider

# 1. Create browser and LLM provider
browser = SentienceBrowser(api_key="your_sentience_key")
llm = OpenAIProvider(api_key="your_openai_key", model="gpt-4o")

# 2. Create agent
agent = SentienceAgent(browser, llm)

# 3. Navigate and use natural language commands
browser.page.goto("https://amazon.com")
agent.act("Click the search box")
agent.act("Type 'wireless mouse' into the search field")
agent.act("Press Enter key")
agent.act("Click the first product result")

# Check token usage
print(f"Tokens used: {agent.get_token_stats()['total_tokens']}")

Level 4: ConversationalAgent - Full Automation (Maximum Convenience)

ONE command does everything - automatic planning and execution:

from sentience import SentienceBrowser, ConversationalAgent
from sentience.llm import OpenAIProvider

# 1. Setup
browser = SentienceBrowser(api_key="your_sentience_key")
llm = OpenAIProvider(api_key="your_openai_key", model="gpt-4o")
agent = ConversationalAgent(browser, llm)

# 2. Natural language - agent plans and executes automatically
browser.page.goto("https://amazon.com")
response = agent.execute(
    "Search for wireless mouse and tell me the price of the top result"
)
print(response)  # "I found the top result for wireless mouse. It's priced at $24.99..."

# 3. Follow-up questions maintain context
follow_up = agent.chat("Add it to cart")
print(follow_up)

# 4. Get conversation summary
summary = agent.get_summary()
print(summary)

Available LLM Providers

# OpenAI (GPT-4, GPT-4o, etc.)
from sentience.llm import OpenAIProvider
llm = OpenAIProvider(api_key="sk_...", model="gpt-4o")

# Anthropic (Claude)
from sentience.llm import AnthropicProvider
llm = AnthropicProvider(api_key="sk_...", model="claude-3-5-sonnet-20241022")

# Google Gemini
from sentience.llm import GeminiProvider
llm = GeminiProvider(api_key="your_gemini_key", model="gemini-pro")

# GLM (ChatGLM, GLM-4, etc.)
from sentience.llm import GLMProvider
llm = GLMProvider(api_key="your_glm_key", model="glm-4")

# Local LLM (e.g., Qwen, Llama, etc.)
from sentience.llm import LocalLLMProvider
llm = LocalLLMProvider(base_url="http://localhost:8000/v1", model="Qwen/Qwen2.5-3B-Instruct")

When to Use Each Level

Use Raw Playwright (Level 1) when:

Maximum control over every action is required
No external API dependencies allowed
Debugging complex edge cases
Performance is absolutely critical (no API calls)

Use Direct SDK (Level 2) when:

You need precise control over each action
Debugging specific element interactions
Building reusable automation libraries
Performance is critical (minimize LLM calls)

Use SentienceAgent (Level 3) when:

Quick automation tasks
Step-by-step natural language commands
You want to see each action as it happens
Most common use case

Use ConversationalAgent (Level 4) when:

Complex multi-step tasks
Building conversational interfaces
You want the agent to plan and execute automatically
Maximum convenience is priority

Cost Comparison

Understanding the cost and complexity tradeoffs between levels:

Lines of Code Comparison

Same task: "Search Amazon for wireless mouse and click first result"

Level	Lines of Code	Complexity	Credits Used	LLM Tokens
Level 1	~15 lines	High (CSS selectors)	0	0
Level 2	~10 lines	Medium (semantic queries)	~2-4	0
Level 3	~5 lines	Low (natural language)	~2-4	~1,500
Level 4	~3 lines	Very Low (one command)	~2-4	~2,500

Token Cost Analysis (Level 3 vs Level 4)

Level 3: SentienceAgent - Manual step-by-step commands

4 actions × ~400 tokens = ~1,600 tokens ($0.006 with GPT-4o)
You control each step explicitly
Predictable token usage

Level 4: ConversationalAgent - Automatic planning

Initial planning: ~800 tokens
4 actions × ~400 tokens = ~1,600 tokens
Response generation: ~200 tokens
Total: ~2,600 tokens ($0.010 with GPT-4o)
Agent plans and executes automatically
Slightly higher token usage for convenience

Credit Cost Breakdown

Sentience API Credits (same across all SDK levels):

Snapshot with server ranking: 1 credit per call
Local extension only: 0 credits
Typical task (4 snapshots): 4 credits ≈ $0.004

LLM Costs (Level 3 & 4 only):

GPT-4o: ~$0.006-0.010 per task
Claude Sonnet: ~$0.009-0.015 per task
Local LLM (Qwen/Llama): $0 (free!)

Total Cost Per Task

Level	Sentience Credits	LLM Cost	Total Cost
Level 1	$0	$0	$0
Level 2	$0.004	$0	$0.004
Level 3	$0.004	$0.006	$0.010
Level 4	$0.004	$0.010	$0.014
Level 3 (Local LLM)	$0.004	$0	$0.004

Cost Optimization Tips:

Use Level 2 for repetitive tasks (no LLM costs)
Use Level 3 with local LLM for zero LLM costs
Use use_api=False in snapshots to avoid credit usage (free tier)
Batch similar tasks to minimize LLM context switching

Next Steps

Tracing & Debugging → - Learn how to debug and monitor agent behavior
Browser Setup → - Configure browser options
Examples → - See more agent examples

Quick Start

Tracing & Debugging