High-speed Markdown extraction for RAG pipelines. Converts noisy HTML into normalized, LLM-ready text.
Reader Mode bypasses the heavy rendering pipeline (Chrome/Puppeteer) and uses a specialized Rust-based parser to strip navigation, ads, footers, and tracking scripts. It returns only the semantic content.
Automatically filters sidebars, popups, and non-content DOM nodes.
Reduces whitespace and formatting overhead by ~30% vs raw HTML.
curl -X POST https://api.sentienceapi.com/v1/observe \
-H "Authorization: Bearer sk_live_..." \
-H "Content-Type: application/json" \
-d '{
"url": "https://news.ycombinator.com",
"mode": "read",
"format": "markdown",
"options": {
"contentLimit": 50000
}
}'urlRequiredTarget HTTP/HTTPS URL.
modeRequiredMust be set to "read".
formatOptionalOutput format: "markdown" (default) or "text".
options.contentLimitOptionalMaximum characters in content (default: 50000).
{
"status": "success",
"url": "https://news.ycombinator.com",
"title": "Hacker News",
"content": "# Hacker News\n\nNew | Past | Comments | Ask | Show...\n\n## Top Stories\n\n### Show HN: I built a perception layer for AI agents\n\nWe're excited to share SentienceAPI...",
"format": "markdown",
"author": null,
"published_date": null,
"word_count": 1247,
"reading_time_minutes": 6,
"timestamp": "2025-12-12T10:30:00.123Z"
}Multi-line breaks and tabs are compressed to single markdown breaks.
Decorative or empty links are stripped; semantic links are preserved.