Content Extraction API
Extract specific HTML elements (titles, headers, paragraphs) in a structured, LLM-ready format. Feed clean web data directly into RAG pipelines, content analysis tools, or AI applications without running your own scraper infrastructure.
Endpoint
HTTP
GET https://opengraph.io/api/1.1/extract/{encoded_url}?app_id=YOUR_APP_IDParameters
Path Parameters
| Parameter | Type | Description |
|---|---|---|
| encoded_url | string | Required. URL-encoded target URL |
Query Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| app_id | string | - | Required. Your API key |
| html_elements | string | title,h1,h2,h3,h4,h5,p | Comma-separated list of HTML elements to extract |
| full_render | boolean | false | Enable JavaScript rendering |
| cache_ok | boolean | true | Allow cached results |
| use_proxy | boolean | false | Use standard proxy for protected sites |
| use_premium | boolean | false | Use residential proxy |
| use_superior | boolean | false | Use mobile proxy (highest success rate) |
| accept_lang | string | en-US | Language header for localized content |
Supported HTML Elements
| Element | Description |
|---|---|
| title | Page title tag |
| h1, h2, h3, h4, h5, h6 | Heading elements |
| p | Paragraphs |
| a | Links |
| li | List items |
| span | Span elements |
| div | Div elements |
Example Request
curl "https://opengraph.io/api/1.1/extract/https%3A%2F%2Fexample.com?app_id=YOUR_APP_ID&html_elements=title,h1,p"Example Response
Response
{
"tags": [
{
"tag": "title",
"innerText": "Example Domain",
"position": 0
},
{
"tag": "h1",
"innerText": "Example Domain",
"position": 1
},
{
"tag": "p",
"innerText": "This domain is for use in illustrative examples in documents.",
"position": 2
},
{
"tag": "p",
"innerText": "You may use this domain in literature without prior coordination.",
"position": 3
}
],
"concatenatedText": "Example Domain Example Domain This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination.",
"requestInfo": {
"host": "example.com",
"responseCode": 200
}
}Response Fields
| Field | Description |
|---|---|
| tags | Array of extracted elements with tag name, text content, and position |
| concatenatedText | All extracted text joined together (useful for LLM summarization) |
| requestInfo | Metadata about the request |
LLM Tip: Use concatenatedText when feeding content to AI models for summarization. It provides clean text without HTML markup.
Use Cases
- AI/LLM data pipelines – feed clean text to language models
- Content analysis and summarization
- SEO content auditing – check heading structure
- Research and data collection
- Automated reporting
MCP Tool
This endpoint is available as the Extract Content tool in the OpenGraph MCP Server. Your AI assistant can extract elements directly without writing any code.