Docs/SDK/Content Reading

Content Reading API

Extract page content as text, markdown, or raw HTML using the read() function.

Basic Usage

from sentience import read

# Get markdown content
result = read(browser, format="markdown")
print(result["content"])

# Get plain text
result = read(browser, format="text")
print(result["content"])

# Get raw HTML (for external processing)
result = read(browser, format="raw")
html = result["content"]

Parameters

Python:

browser (SentienceBrowser): Browser instance
format (str): Output format - "raw" (default), "text", or "markdown"
enhance_markdown (bool): Use markdownify for better conversion (default: True)

TypeScript:

browser (SentienceBrowser): Browser instance
options (object, optional):
- format (string): "raw", "text", or "markdown"

Returns

Dict/object with:

status: "success" or "error"
url: Page URL
format: Output format
content: Extracted content (string)
length: Content length in characters

Format Options

"raw" (default):

Returns the raw HTML content of the page
Useful for custom processing or parsing with external libraries

"text":

Extracts plain text content
Strips HTML tags and formatting
Useful for text analysis or NLP tasks

"markdown":

Converts HTML to Markdown format
Preserves structure (headings, lists, links)
Enhanced with markdownify for better conversion quality
Useful for documentation or content extraction

Example Use Cases

Extract article content:

browser.page.goto("https://example.com/article")
result = read(browser, format="markdown")
article_content = result["content"]
print(f"Article length: {result['length']} characters")

Extract text for analysis:

result = read(browser, format="text")
text_content = result["content"]
# Use with NLP libraries or text analysis tools

Save content to file:

result = read(browser, format="markdown")
with open("page_content.md", "w", encoding="utf-8") as f:
    f.write(result["content"])

Wait API

Screenshot API