Capture high-fidelity screenshots alongside precise element coordinates. Perfect for multimodal AI agents, visual debugging, and screenshot-based workflows.
Visual Mode combines the power of Map Mode's element coordinate mapping with high-fidelity screenshot capture. This enables multimodal AI workflows where agents can both "see" the webpage through pixels and understand its structure through geometry data.
Understanding when to use each mode
Fast geometry extraction for navigation and automation. No screenshot included.
Geometry + screenshot for visual verification and multimodal AI workflows.
Send a POST request to /v1/observe with Visual Mode parameters
1curl -X POST https://api.sentienceapi.com/v1/observe \
2 -H "Authorization: Bearer sk_live_..." \
3 -H "Content-Type: application/json" \
4 -d '{
5 "url": "https://app.example.com/login",
6 "mode": "visual",
7 "options": {
8 "screenshot_delivery": "base64"
9 }
10 }'1curl -X POST https://api.sentienceapi.com/v1/observe \
2 -H "Authorization: Bearer sk_live_..." \
3 -H "Content-Type: application/json" \
4 -d '{
5 "url": "https://app.example.com/login",
6 "mode": "visual",
7 "options": {
8 "screenshot_delivery": "base64",
9 "limit": 50,
10 "filter": {
11 "allowed_roles": ["button", "textbox", "link"]
12 }
13 }
14 }'Get visual styling hints (color names, cursor type, prominence) for icon-heavy UIs. Adds ~17 tokens per element.
1curl -X POST https://api.sentienceapi.com/v1/observe \
2 -H "Authorization: Bearer sk_live_..." \
3 -H "Content-Type: application/json" \
4 -d '{
5 "url": "https://figma.com/design/123",
6 "mode": "visual",
7 "options": {
8 "screenshot_delivery": "base64",
9 "include_visual_cues": true,
10 "limit": 100
11 }
12 }'💡 Use Case: Perfect for icon-heavy UIs and design tools. Returns color names, cursor type, and prominence detection to help AI agents identify icon-only buttons and primary CTAs.
Returns presigned URLs instead of base64. Reduces payload size by 99.9% (~200KB vs ~1MB). Perfect for OpenAI/Anthropic vision APIs. URLs expire in 24 hours.
1curl -X POST https://api.sentienceapi.com/v1/observe \
2 -H "Authorization: Bearer sk_live_..." \
3 -H "Content-Type: application/json" \
4 -d '{
5 "url": "https://app.example.com/login",
6 "mode": "visual",
7 "options": {
8 "screenshot_delivery": "url"
9 }
10 }'💡 Benefits: 99.9% smaller payloads, faster JSON parsing, native OpenAI/Anthropic support. Use screenshot_delivery: "url" for production AI agents.
urlrequiredstringThe URL of the webpage to capture. Must be a valid HTTP or HTTPS URL.
moderequiredstringMust be set to "visual" for Visual Mode.
optionsoptionalobjectAdvanced options for fine-tuning Visual Mode behavior.
options.limitMaximum number of elements to return. Reduces token costs when sending to vision models.
options.screenshot_deliveryNEWScreenshot delivery mode: "base64" (default, inline base64 ~1MB) or "url" (presigned URL ~200KB, recommended for AI agents). URLs expire in 24h.
options.filterFilter elements by attributes (same as Map Mode). Supports min_area, allowed_tags, allowed_roles, min_z_index.
options.include_visual_cuesEnable visual styling hints (color names, cursor type, prominence). Adds ~17 tokens per element. Useful for icon-heavy UIs.
Screenshot + element coordinates in a single response
Example response with new fields: is_occluded (always included) and visual_cues (when include_visual_cues=true)
1{
2 "engine": "precision",
3 "source": "precision_visual",
4 "status": "success",
5 "url": "https://app.example.com/login",
6 "layout_viewport": {
7 "width": 1024,
8 "height": 768,
9 "device_pixel_ratio": 1.0
10 },
11 "interactable_elements": [
12 {
13 "id": 1,
14 "uid": "email",
15 "tag": "input",
16 "role": "textbox",
17 "text": "",
18 "selector": "input#email",
19 "bbox": {
20 "x": 350,
21 "y": 200,
22 "w": 300,
23 "h": 45
24 },
25 "is_visible": true,
26 "z_index": 1,
27 "in_viewport": true,
28 "is_occluded": false,
29 "attributes": {
30 "type": "email",
31 "placeholder": "you@example.com",
32 "aria_label": "Email address"
33 }
34 },
35 {
36 "id": 2,
37 "uid": "password",
38 "tag": "input",
39 "role": "textbox",
40 "text": "",
41 "selector": "input#password",
42 "bbox": {
43 "x": 350,
44 "y": 270,
45 "w": 300,
46 "h": 45
47 },
48 "is_visible": true,
49 "z_index": 1,
50 "in_viewport": true,
51 "is_occluded": false,
52 "attributes": {
53 "type": "password",
54 "aria_label": "Password"
55 }
56 },
57 {
58 "id": 3,
59 "uid": "login-btn",
60 "tag": "button",
61 "role": "button",
62 "text": "Sign In",
63 "selector": "button#login-btn",
64 "bbox": {
65 "x": 350,
66 "y": 340,
67 "w": 300,
68 "h": 50
69 },
70 "is_visible": true,
71 "z_index": 1,
72 "in_viewport": true,
73 "is_occluded": false,
74 "attributes": {
75 "aria_label": "Sign in to your account"
76 },
77 "visual_cues": {
78 "background_color_name": "blue",
79 "color_name": "white",
80 "cursor": "pointer",
81 "is_primary": true
82 }
83 }
84 ],
85 "screenshot": {
86 "type": "base64",
87 "data": "iVBORw0KGgoAAAANSUhEUgAAA...",
88 "format": "png",
89 "size_bytes": 665432
90 },
91 "screenshot_error": null,
92 "timestamp": "2025-12-18T06:26:29.812Z",
93 "total_elements_extracted": 3
94}is_occluded: Always included. Indicates whether the element is covered by another element (detected via elementFromPoint raycasting).
visual_cues: Only included when include_visual_cues: true. Contains 4 fields: background_color_name, color_name, cursor, and is_primary.
The screenshot field supports two formats based on options.screenshot_delivery:
"screenshot": {
"type": "base64",
"data": "iVBORw0KGgoAAAANSUhEUgAAA...",
"format": "png",
"size_bytes": 665432
}"screenshot": {
"type": "url",
"url": "https://sentience-screenshots.sfo3.digitaloceanspaces.com/screenshots/76795555-80be-4d27-84f0-73e0a0ce68c4.png?x-id=GetObject&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=7UL6GTGZBN3M2LVTRGVD%2F20251219%2Fsfo3%2Fs3%2Faws4_request&X-Amz-Date=20251219T051045Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&X-Amz-Signature=06ac3c6d6111109329417b82e35edb7438a40b29d0c806c85c5e39b7eac8cc4c",
"format": "png",
"size_bytes": 158277,
"expires_at": "2025-12-20T05:10:45.401346170+00:00"
}screenshotUPDATEDPolymorphic object with two formats:
type: "base64" - Contains data field with base64 string (default)type: "url" - Contains url field with presigned URL and expires_at timestampFormat controlled by options.screenshot_delivery. Legacy string format (base64 data URI) still supported for backward compatibility.
screenshot_errorError message if screenshot capture failed (null if successful). Geometry extraction still succeeds even if screenshot fails.
sourceSet to "precision_visual" for Visual Mode responses.
interactable_elementsArray of interactive elements with coordinates (same format as Map Mode). See Map Mode docs for full field reference.
Send screenshots to GPT-4V or Claude 3 for visual verification: "Is this the correct product page?" or "Did the form submit successfully?"
Detect visual CAPTCHAs that block automation by analyzing screenshots with vision models.
Understand why automation fails by seeing the actual rendered state alongside element coordinates.
Compare screenshots before and after deployments to detect unexpected visual changes.
Screenshot is returned as a polymorphic object with two delivery modes:
Base64 (Default):
{ "type": "base64", "data": "iVBORw0KGgo...", "format": "png" }Presigned URL:
{ "type": "url", "url": "https://...", "expires_at": "2025-12-19T..." }Control screenshot delivery via screenshot_delivery option:
"base64" - Inline base64 (~1MB payload, default)"url" - Presigned URL (~200KB payload, expires in 24h, recommended for AI agents)Captures the visible viewport only (1024×768 standard), not full-page screenshots.
Base64: ~500KB-1MB JSON payload. URL: ~200KB JSON payload (image downloaded separately).
Handle both formats in your code:
const imgSrc = screenshot.type === 'url' ? screenshot.url : `data:image/png;base64,${screenshot.data}`;Here's how to combine Visual Mode with GPT-4V for intelligent verification:
1# Step 1: Navigate to product page (fast, Map Mode)
2page = sentience.observe(
3 url="https://amazon.com/product/B123",
4 mode="map"
5)
6add_to_cart = find_element(page, text="Add to Cart")
7
8# Step 2: Visual verification before clicking (Visual Mode)
9visual = sentience.observe(
10 url="https://amazon.com/product/B123",
11 mode="visual",
12 options={"limit": 50}
13)
14
15# Step 3: Send to GPT-4V for verification
16is_correct = openai.chat.completions.create(
17 model="gpt-4-vision-preview",
18 messages=[{
19 "role": "user",
20 "content": [
21 {"type": "text", "text": "Is this the 'Wireless Headphones' product?"},
22 {"type": "image_url", "image_url": {"url": visual["screenshot"]}}
23 ]
24 }]
25)
26
27# Step 4: Only proceed if verification passes
28if is_correct:
29 click(add_to_cart["bbox"])
30 print("Product verified and added to cart!")
31else:
32 print("Wrong product detected, aborting.")If Visual Mode encounters an error, you'll receive an error response:
{
"error": "Failed to render page: timeout after 30s"
}screenshot_error containing the error message and screenshot: null.400Bad Request
Invalid URL, mode, or options
401Unauthorized
Missing or invalid API key
500Internal Server Error
Failed to capture screenshot or extract elements