Docs/SDK/Agent Quick Start

Agent Quick Start

NEW in v0.3.0+ - The Agent Abstraction Layer provides natural language automation with 4 levels of control, from simple commands to full conversational AI.

Overview

The Sentience SDK offers multiple levels of abstraction for browser automation:

LevelUse CaseCode ReductionRequirements
Level 1: Raw Playwright

Maximum control, edge cases

0%

LLM API key

Level 2: Direct SDK

Precise control, debugging

80%

Sentience API key

Level 3: SentienceAgent

Quick automation, step-by-step

95%

LLM API key

Level 4: ConversationalAgent

Complex tasks, chatbots

99%

LLM API key

Quick Tip: Start with Level 3 (SentienceAgent) for most automation tasks. Upgrade to Level 4 (ConversationalAgent) when you need multi-step planning or conversational interfaces.

Level 1: Raw Playwright - Maximum Control

Use Playwright directly with semantic element finding - no LLM required:

from playwright.sync_api import sync_playwright

# Pure Playwright - no Sentience SDK
with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    page = browser.new_page()

    # Navigate
    page.goto("https://amazon.com")

    # Find elements with CSS selectors
    page.locator('input[id="twotabsearchtextbox"]').fill("wireless mouse")
    page.locator('input[id="nav-search-submit-button"]').click()
    page.wait_for_selector('.s-result-item')

    # Click first result
    page.locator('.s-result-item').first.click()

    browser.close()

When to use Level 1:

Limitations:

Level 2: Direct SDK - Semantic Queries

Use Sentience SDK for semantic element finding without LLMs:

from sentience import SentienceBrowser, snapshot, find, click, type_text, press

# Sentience SDK - semantic queries, no LLM
with SentienceBrowser(api_key="your_key") as browser:
    browser.page.goto("https://amazon.com")

    # Semantic element finding (no CSS selectors!)
    snap = snapshot(browser)
    search_box = find(snap, "role=textbox text~'search'")
    type_text(browser, search_box.id, "wireless mouse")
    press(browser, "Enter")

    # Wait for results
    snap = snapshot(browser)
    first_result = find(snap, "role=link importance>500")
    click(browser, first_result.id)

Benefits over Level 1:

When to use Level 2:

Use single natural language commands - the agent handles the rest:

from sentience import SentienceBrowser, SentienceAgent
from sentience.llm import OpenAIProvider

# 1. Create browser and LLM provider
browser = SentienceBrowser(api_key="your_sentience_key")
llm = OpenAIProvider(api_key="your_openai_key", model="gpt-4o")

# 2. Create agent
agent = SentienceAgent(browser, llm)

# 3. Navigate and use natural language commands
browser.page.goto("https://amazon.com")
agent.act("Click the search box")
agent.act("Type 'wireless mouse' into the search field")
agent.act("Press Enter key")
agent.act("Click the first product result")

# Check token usage
print(f"Tokens used: {agent.get_token_stats()['total_tokens']}")

Level 4: ConversationalAgent - Full Automation (Maximum Convenience)

ONE command does everything - automatic planning and execution:

from sentience import SentienceBrowser, ConversationalAgent
from sentience.llm import OpenAIProvider

# 1. Setup
browser = SentienceBrowser(api_key="your_sentience_key")
llm = OpenAIProvider(api_key="your_openai_key", model="gpt-4o")
agent = ConversationalAgent(browser, llm)

# 2. Natural language - agent plans and executes automatically
browser.page.goto("https://amazon.com")
response = agent.execute(
    "Search for wireless mouse and tell me the price of the top result"
)
print(response)  # "I found the top result for wireless mouse. It's priced at $24.99..."

# 3. Follow-up questions maintain context
follow_up = agent.chat("Add it to cart")
print(follow_up)

# 4. Get conversation summary
summary = agent.get_summary()
print(summary)

Available LLM Providers

# OpenAI (GPT-4, GPT-4o, etc.)
from sentience.llm import OpenAIProvider
llm = OpenAIProvider(api_key="sk_...", model="gpt-4o")

# Anthropic (Claude)
from sentience.llm import AnthropicProvider
llm = AnthropicProvider(api_key="sk_...", model="claude-3-5-sonnet-20241022")

# Google Gemini
from sentience.llm import GeminiProvider
llm = GeminiProvider(api_key="your_gemini_key", model="gemini-pro")

# GLM (ChatGLM, GLM-4, etc.)
from sentience.llm import GLMProvider
llm = GLMProvider(api_key="your_glm_key", model="glm-4")

# Local LLM (e.g., Qwen, Llama, etc.)
from sentience.llm import LocalLLMProvider
llm = LocalLLMProvider(base_url="http://localhost:8000/v1", model="Qwen/Qwen2.5-3B-Instruct")

When to Use Each Level

Use Raw Playwright (Level 1) when:

Use Direct SDK (Level 2) when:

Use SentienceAgent (Level 3) when:

Use ConversationalAgent (Level 4) when:

Cost Comparison

Understanding the cost and complexity tradeoffs between levels:

Lines of Code Comparison

Same task: "Search Amazon for wireless mouse and click first result"

LevelLines of CodeComplexityCredits UsedLLM Tokens
Level 1~15 linesHigh (CSS selectors)00
Level 2~10 linesMedium (semantic queries)~2-40
Level 3~5 linesLow (natural language)~2-4~1,500
Level 4~3 linesVery Low (one command)~2-4~2,500

Token Cost Analysis (Level 3 vs Level 4)

Level 3: SentienceAgent - Manual step-by-step commands

Level 4: ConversationalAgent - Automatic planning

Credit Cost Breakdown

Sentience API Credits (same across all SDK levels):

LLM Costs (Level 3 & 4 only):

Total Cost Per Task

LevelSentience CreditsLLM CostTotal Cost
Level 1$0$0$0
Level 2$0.004$0$0.004
Level 3$0.004$0.006$0.010
Level 4$0.004$0.010$0.014
Level 3 (Local LLM)$0.004$0$0.004

Cost Optimization Tips:

  1. Use Level 2 for repetitive tasks (no LLM costs)
  2. Use Level 3 with local LLM for zero LLM costs
  3. Use use_api=False in snapshots to avoid credit usage (free tier)
  4. Batch similar tasks to minimize LLM context switching

Next Steps