Extract page content as text, markdown, or raw HTML using the read() function.
from sentience import read
# Get markdown content
result = read(browser, format="markdown")
print(result["content"])
# Get plain text
result = read(browser, format="text")
print(result["content"])
# Get raw HTML (for external processing)
result = read(browser, format="raw")
html = result["content"]Python:
browser (SentienceBrowser): Browser instanceformat (str): Output format - "raw" (default), "text", or "markdown"enhance_markdown (bool): Use markdownify for better conversion (default: True)TypeScript:
browser (SentienceBrowser): Browser instanceoptions (object, optional):
format (string): "raw", "text", or "markdown"Dict/object with:
status: "success" or "error"url: Page URLformat: Output formatcontent: Extracted content (string)length: Content length in characters"raw" (default):
"text":
"markdown":
markdownify for better conversion qualityExtract article content:
browser.page.goto("https://example.com/article")
result = read(browser, format="markdown")
article_content = result["content"]
print(f"Article length: {result['length']} characters")Extract text for analysis:
result = read(browser, format="text")
text_content = result["content"]
# Use with NLP libraries or text analysis toolsSave content to file:
result = read(browser, format="markdown")
with open("page_content.md", "w", encoding="utf-8") as f:
f.write(result["content"])