Extract

Data Extraction API for Websites and URLs

Extract readable text, page content, and structured fields from websites and URLs with an API built for web content extraction, CSS selector scraping, and JSON output.

POSTapi.opengraph.io/api/3.0/extract
site=https://example.com/article/ml-overview
Response 200 OK
url“https://example.com/article/ml-overview”
concatenatedText“Machine learning is transforming how...”
data.headline“The Future of Machine Learning”
data.author“Jane Smith”
data.publishDate“2025-06-01”
data.content“Recent advances in neural networks...”
1B+URLs processed
1,000sof developers

Powering link previews, metadata extraction, Markdown conversion, web scraping, and content extraction workflows at scale.

Use cases

Extract useful content from any webpage

The Extract API helps developers pull readable text and structured content from websites without manually writing scrapers, parsing HTML, or cleaning noisy page markup.

Extract text from websitesExtract text from URLsStructured content extractionCSS selector scrapingContent enrichmentResearch workflowsAI data preparationInternal toolsMonitoring workflowsJSON output
API response

Readable text and structured output from one URL

Send a URL and receive extracted content your application can store, analyze, display, or pass into downstream workflows.

Extracted text

Pull readable text from a webpage without manually cleaning raw HTML.

Main content

Retrieve the primary page content while reducing navigation, layout, and boilerplate noise.

Page title

Capture page-level context alongside the extracted content.

Source URL

Track exactly where the extracted content came from.

CSS selector output

Use selectors to extract specific elements from the page when you need targeted fields.

Structured JSON

Receive extracted content in a developer-friendly JSON response your app can process directly.

Parsed HTML fields

Scrape and parse HTML into named fields your application can store or forward downstream.

Extraction metadata

Status, source, and response details to support logging, debugging, and downstream processing.

API comparison

Choose the right Extract API

Three extraction APIs, three different outputs. Here is how to know which one fits your workflow.

Content Extraction API

This page

Extract text and selected fields from websites and URLs.

Use when:

You want usable content or specific page fields without processing full raw HTML yourself.

Learn more

Web Scraping API

Retrieve raw or rendered HTML from public URLs.

Use when:

You need lower-level access to webpage HTML and want to control your own parsing and processing.

Learn more

HTML to Markdown API

Convert webpages or HTML into clean Markdown.

Use when:

You need Markdown for AI apps, RAG pipelines, research workflows, summarization, or documentation.

Learn more
Text extraction

Extract text from websites without parsing HTML yourself

Raw HTML is often filled with layout markup, navigation, scripts, ads, footers, and unrelated elements. The Extract API helps you pull the useful text and page content from a URL so your application can work with cleaner web data.

  • Extract text from a website URL in one API call
  • Pull readable content from noisy HTML automatically
  • Reduce manual parsing and cleanup work
  • Create cleaner inputs for content workflows
  • Store web content in a structured, portable format
  • Use extracted text for analysis, monitoring, enrichment, or AI workflows
Raw HTMLInput
<nav>...</nav><script>...</script><div class="ads">...</div><article>Machine learning is...<footer>...</footer>
Extracted textOutput
“Machine learning is transforming how developers build products. Recent advances in neural...”
CSS selectors

Target specific page elements with CSS selectors

When you need more than the main readable text, CSS selector extraction lets you target specific elements on a page and return structured values your application can use.

CSS selector scraping

Use CSS selectors to pull specific elements from a webpage by targeting the exact markup you need.

Structured field extraction

Return targeted values instead of full-page text — headlines, authors, dates, or any named field.

JSON output

Receive selected fields in a structured JSON response your application can process, store, or forward.

Scrape and parse HTML

Turn page markup into cleaner extracted fields without writing your own HTML parser.

Extraction modes

Use smart extraction by default, selectors when you need control

Smart extraction

Best when you want the main readable content from a URL without configuring selectors.

Use it for:

  • Article text
  • Page content
  • Research summaries
  • Content enrichment
  • Monitoring workflows

Selector-based extraction

Best when you know the exact page elements you want to pull.

Use it for:

  • Specific named fields
  • Repeated page elements
  • Custom page structures
  • Structured JSON output
  • Targeted scraping workflows
Start Free
How it works

From URL to extracted content in seconds

01

Send a URL

Pass a public webpage URL to OpenGraph.io. Add CSS selectors if you need targeted fields.

02

Choose smart extraction or selectors

Let OpenGraph.io extract readable content automatically, or provide selectors for specific elements.

03

Receive extracted content

Use the returned text or JSON output in your app, pipeline, research workflow, or internal tool.

Developer experience

Built for developers who need usable web content

Simple API access

Start with an API key and extract content from URLs. No SDK or scraper configuration required.

Smart defaults

Extract readable content without configuring selectors — the API handles rendering and cleanup.

Selector support

Use CSS selectors when you need targeted fields instead of full-page extracted text.

JSON output

Receive structured data that is easy to store, process, and pass into downstream systems.

Free requests to start

Test real URLs before scaling into production. No credit card required.

Works across URL workflows

Pair extracted content with Markdown conversion, raw HTML retrieval, screenshots, and metadata.

FAQ

A data extraction API lets developers send a URL and receive extracted content or structured fields that can be used in applications, pipelines, and internal tools — without building a custom scraper or HTML parser.
Yes. The API extracts readable text from a public webpage URL so your application does not have to manually parse the full HTML response.
Yes. Send a URL to the API and receive extracted page content or selected fields in a structured JSON response.
The Web Scraping API focuses on retrieving raw or rendered HTML. The Content Extraction API focuses on pulling usable text or structured fields from that page content so your application gets cleaner data without processing raw HTML itself.
The HTML to Markdown API returns Markdown output for AI and content workflows. The Content Extraction API returns extracted text or structured field output, which is better when you want specific fields or content in JSON without a Markdown format.
Yes. Provide CSS selectors in your request when you need to target specific elements or named fields on a page.
Yes. The API returns extracted content in a structured JSON response that includes the concatenated text and any selector-matched fields your application requested.
Full documentation — including request parameters, selector usage, response schema, and code examples — is available in the API reference.

Start extracting content from websites

Use OpenGraph.io to extract readable text, selected fields, and structured content from websites and URLs without building a custom scraping and parsing pipeline.

No credit card required. Free requests included.