Introduction
Welcome to the API documentation for Opengraph.io! With our API, you can easily parse HTML from URLs provided for Open Graph tags.
We have language bindings in NodeJS, Ruby, PHP, C#, and JQuery! You can view code examples in the dark area to the right, and you can switch the programming language of the examples with the tabs in the top right.
Authentication
https://opengraph.io/api/1.1/site/<URL encoded site URL>?app_id=xxxxxx
To authenticate your API requests, simply include your unique app_id query parameter with each request. This app_id serves as your API key and is provided to you upon registration.
To get started with the Opengraph.io API, you'll need to create a free account to receive your app_id. Opengraph.io Signup
API Endpoints
Get Open Graph
const url = "https://opengraph.io/api/1.1/site/:site?app_id=xxxxxx";
async function fetchData() {
try {
const response = await fetch(url);
const data = await response.json();
console.log(data);
} catch (error) {
console.error(error);
}
}
fetchData();
require 'net/http'
require 'json'
url = URI("https://opengraph.io/api/1.1/site/:site?app_id=xxxxxx")
http = Net::HTTP.new(url.host, url.port)
http.use_ssl = true
request = Net::HTTP::Get.new(url)
response = http.request(request)
data = JSON.parse(response.body)
puts data
using System;
using System.Net.Http;
using System.Threading.Tasks;
using Newtonsoft.Json;
class Program
{
static async Task Main(string[] args)
{
var url = "https://opengraph.io/api/1.1/site/:site?app_id=xxxxxx";
using (var httpClient = new HttpClient())
{
using (var response = await httpClient.GetAsync(url))
{
string apiResponse = await response.Content.ReadAsStringAsync();
dynamic data = JsonConvert.DeserializeObject(apiResponse);
Console.WriteLine(data);
}
}
}
}
$url = 'https://opengraph.io/api/1.1/site/:site?app_id=xxxxxx';
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($curl);
$data = json_decode($response, true);
print_r($data);
The API returns JSON structured like this:
{
"hybridGraph": {
"title": "Example Title",
"description": "Example Description",
"type": "Example Type",
"image": "https://example.com/image.png",
"url": "https://example.com",
"favicon": "https://example.com/favicon.ico",
"site_name": "Example Site Name",
"articlePublishedTime": "2023-03-23T00:00:00.000Z",
"articleAuthor": "https://example.com/author"
},
"openGraph": {
"title": "Example Title",
"description": "Example Description",
"type": "Example Type",
"image": {
"url": "https://example.com/image.png"
},
"url": "https://example.com",
"site_name": "Example Site Name",
"articlePublishedTime": "2023-03-23T00:00:00.000Z",
"articleAuthor": "https://example.com/author"
},
"htmlInferred": {
"title": "Example Title",
"description": "Example Description",
"type": "Example Type",
"image": "https://example.com/image.png",
"url": "https://example.com",
"favicon": "https://example.com/favicon.ico",
"site_name": "Example Site Name",
"images": [
"https://example.com/image1.png",
"https://example.com/image2.png",
"https://example.com/image3.png",
"https://example.com/image4.png"
]
},
"requestInfo": {
"redirects": 1,
"host": "https://example.com",
"responseCode": 200,
"cache_ok": true,
"max_cache_age": 432000000,
"accept_lang": "en-US,en;q=0.9",
"url": "https://example.com",
"full_render": false,
"use_proxy": false,
"use_superior" : false,
"responseContentType": "text/html; charset=utf-8"
},
"accept_lang": "en-US,en;q=0.9",
"is_cache": false,
"url": "https://example.com"
}
HTTP Request
GET https://opengraph.io/api/1.1/site/<URL encoded site URL>?app_id=xxxxxx
URL Parameters
Parameter | Required | Description |
---|---|---|
:site | true | This is a required parameter and is an encoded URL of the website from which you want open graph information. |
Query Parameters
Parameter | Required | Example | Description |
---|---|---|---|
app_id | yes | - | The API key for registered users. Create an account (no cc ever required) to receive your app_id. |
cache_ok | no | false | This will force our servers to pull a fresh version of the site being requested. By default this value is true |
full_render | no | false | This will fully render the site using a chrome browser before parsing its contents. This is especially helpful for single page applications and JS redirects. This will slow down the time it takes to get a response by around 1.5 seconds. |
use_proxy | no | false | Route your request through residential and mobile proxies to avoid bot detection. This will slow down requests 3-10 seconds and can cause requests to time out. NOTE: Proxies are a limited resource and expensive for our team maintain. Free accounts share a small pool of proxies. If you plan on using proxies often, paid accounts provide dedicated concurrent proxies for your account. |
use_premium | no | false | The Premium Proxy feature in our API allows you to leverage residential and mobile proxy pools for enhanced scraping performance. |
use_superior | no | false | The Superior Proxy feature is designed to tackle the most demanding scraping scenarios, allowing you to overcome the challenges posed by highly restrictive websites. By leveraging our superior proxy option, you can bypass bot detection mechanisms and access data from even the toughest sources. |
auto_proxy | no | false | By default, the Auto Proxy is turned on and will use a proxy for any domain that our team has
identified as requiring one to bypass bot detection. Set the auto_proxy query
parameter to false if you would like to turn the feature off. If you find a domain that
needs a proxy but isn't being automatically used to fetch the data, please contact support, and we
will update our Auto Proxy settings accordingly.
|
max_cache_age | no | 432000000 | This specifies the maximum age in milliseconds that a cached response should be. If not specified the value is set to 5 days. (5 days * 24 hours _ 60 minutes _ 60 seconds _ 1000ms = 432,000,000 ms) |
accept_lang | no | en-US,en;q=0.9 auto | This specifies the request language sent when requesting the url. This is useful if you want to get the site for languages other than english. The default setting for this will return an english version of a page if it exists. Note: if you specify the value auto the api will use the same language settings of your current request. For more information on what to supply for this field please see: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Language |
Extract Site
const url =
"https://opengraph.io/api/1.1/extract/:site?app_id=xxxxxx&html_elements=title,h1,h2,h3,h4,p";
const fetchData = async () => {
try {
const response = await fetch(url);
const data = await response.json();
console.log(data);
} catch (error) {
console.log(error);
}
};
fetchData();
require 'net/http'
require 'json'
url = URI.parse("https://opengraph.io/api/1.1/extract/:site?app_id=xxxxxx&html_elements=title,h1,h2,h3,h4,p")
response = Net::HTTP.get_response(url)
data = JSON.parse(response.body)
puts data
using System;
using System.Net.Http;
using Newtonsoft.Json;
class Program
{
static async System.Threading.Tasks.Task Main(string[] args)
{
var url = "https://opengraph.io/api/1.1/extract/:site?app_id=xxxxxx&html_elements=title,h1,h2,h3,h4,p";
var httpClient = new HttpClient();
var response = await httpClient.GetAsync(url);
var jsonResponse = await response.Content.ReadAsStringAsync();
dynamic data = JsonConvert.DeserializeObject(jsonResponse);
Console.WriteLine(data);
}
}
$url = 'https://opengraph.io/api/1.1/extract/:site?app_id=xxxxxx&html_elements=title,h1,h2,h3,h4,p';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
curl_close($ch);
$data = json_decode($result, true);
print_r($data);
The above command returns JSON structured like this:
{
"tags": [
{
"tag": "title",
"innerText": "example innerText",
"position": 0
},
{
"tag": "h1",
"innerText": "example innerText",
"position": 1
},
{
"tag": "h2",
"innerText": "example innerText",
"position": 2
},
{
"tag": "h2",
"innerText": "example innerText",
"position": 3
},
{
"tag": "h2",
"innerText": "example innerText",
"position": 4
},
{
"tag": "h3",
"innerText": "example innerText",
"position": 5
},
{
"tag": "h3",
"innerText": "example innerText",
"position": 6
},
{
"tag": "h4",
"innerText": "example innerText",
"position": 7
},
{
"tag": "p",
"innerText": "example innerText",
"position": 8
}
],
"concatenatedText": "The concatenatedText property represents the combined text content of all the tags associated with the object, merged into a single string. This property provides a convenient way to access and manipulate the entire text content of the object at once, rather than having to iterate through each tag individually."
}
The extract endpoint enables you to extract information from any website by providing its URL. With this endpoint, you can extract any element you need from the website, including but not limited to the title, header elements (h1 to h5), and paragraph elements (p).
HTTP Request
GET /api/1.1/extract/:site?app_id=xxxxxx
URL Parameters
Parameter | Required | Description |
---|---|---|
:site | true | This is a required parameter and is an encoded URL of the website from which you want open graph information. |
Query Parameters
Parameter | Required | Example | Description |
---|---|---|---|
app_id | yes | - | The API key for registered users. Create an account (no cc ever required) to receive your app_id. |
html_elements | no | h1,h2,h3,p,span | This is an optional parameter and specifies the HTML elements you want to extract from the website. The value should be a comma-separated list of HTML element names. If this parameter is not supplied, the default elements that will be extracted are h1, h2, h3, h4, h5, p, and title. |
cache_ok | no | false | This will force our servers to pull a fresh version of the site being requested. By default this value is true |
full_render | no | false | This will fully render the site using a chrome browser before parsing its contents. This is especially helpful for single page applications and JS redirects. This will slow down the time it takes to get a response by around 1.5 seconds. |
use_proxy | no | false | Route your request through residential and mobile proxies to avoid bot detection. This will slow down requests 3-10 seconds and can cause requests to time out. NOTE: Proxies are a limited resource and expensive for our team maintain. Free accounts share a small pool of proxies. If you plan on using proxies often, paid accounts provide dedicated concurrent proxies for your account. |
use_premium | no | false | The Premium Proxy feature in our API allows you to leverage residential and mobile proxy pools for enhanced scraping performance. |
use_superior | no | false | The Superior Proxy feature is designed to tackle the most demanding scraping scenarios, allowing you to overcome the challenges posed by highly restrictive websites. By leveraging our superior proxy option, you can bypass bot detection mechanisms and access data from even the toughest sources. |
auto_proxy | no | false | By default, the Auto Proxy is turned on and will use a proxy for any domain that our team has
identified as requiring one to bypass bot detection. Set the auto_proxy query
parameter to false if you would like to turn the feature off. If you find a domain that
needs a proxy but isn't being automatically used to fetch the data, please contact support, and we
will update our Auto Proxy settings accordingly.
|
max_cache_age | no | 432000000 | This specifies the maximum age in milliseconds that a cached response should be. If not specified the value is set to 5 days. (5 days * 24 hours _ 60 minutes _ 60 seconds _ 1000ms = 432,000,000 ms) |
accept_lang | no | en-US,en;q=0.9 auto | This specifies the request language sent when requesting the url. This is useful if you want to get the site for languages other than english. The default setting for this will return an english version of a page if it exists. Note: if you specify the value auto the api will use the same language settings of your current request. For more information on what to supply for this field please see: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Language |
Note
The response will only include the elements specified in the html_elements query or the default elements if the query is not supplied.
If the website does not contain any of the specified elements, the corresponding keys in the response will be empty lists.
Scrape Site
const url =
"https://opengraph.io/api/1.1/scrape/:site?app_id=xxxxxx&html_elements=title,h1,h2,h3,h4,p";
const fetchData = async () => {
try {
const response = await fetch(url);
const data = await response.json();
console.log(data);
} catch (error) {
console.log(error);
}
};
fetchData();
require 'net/http'
require 'json'
url = URI.parse("https://opengraph.io/api/1.1/scrape/:site?app_id=xxxxxx&html_elements=title,h1,h2,h3,h4,p")
response = Net::HTTP.get_response(url)
data = JSON.parse(response.body)
puts data
using System;
using System.Net.Http;
using Newtonsoft.Json;
class Program
{
static async System.Threading.Tasks.Task Main(string[] args)
{
var url = "https://opengraph.io/api/1.1/scrape/:site?app_id=xxxxxx&html_elements=title,h1,h2,h3,h4,p";
var httpClient = new HttpClient();
var response = await httpClient.GetAsync(url);
var jsonResponse = await response.Content.ReadAsStringAsync();
dynamic data = JsonConvert.DeserializeObject(jsonResponse);
Console.WriteLine(data);
}
}
$url = 'https://opengraph.io/api/1.1/scrape/:site?app_id=xxxxxx&html_elements=title,h1,h2,h3,h4,p';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
curl_close($ch);
$data = json_decode($result, true);
print_r($data);
The above command returns the Raw HTML of a Web Page:
<html>
...
</html>
Just need the raw HTML?
The Scrape Site endpoint is used to scrape the HTML of a website given its URL
HTTP Request
GET https://opengraph.io/api/1.1/scrape/:site?app_id=xxxxxx
URL Parameters
Parameter | Required | Description |
---|---|---|
:site | true | This is a required parameter and is an encoded URL of the website from which you want open graph information. |
Query Parameters
Parameter | Required | Example | Description |
---|---|---|---|
app_id | yes | - | The API key for registered users. Create an account (no cc ever required) to receive your app_id. |
cache_ok | no | false | This will force our servers to pull a fresh version of the site being requested. By default this value is true |
full_render | no | false | This will fully render the site using a chrome browser before parsing its contents. This is especially helpful for single page applications and JS redirects. This will slow down the time it takes to get a response by around 1.5 seconds. |
use_proxy | no | false | Route your request through residential and mobile proxies to avoid bot detection. This will slow down requests 3-10 seconds and can cause requests to time out. NOTE: Proxies are a limited resource and expensive for our team maintain. Free accounts share a small pool of proxies. If you plan on using proxies often, paid accounts provide dedicated concurrent proxies for your account. |
use_premium | no | false | The Premium Proxy feature in our API allows you to leverage residential and mobile proxy pools for enhanced scraping performance. |
use_superior | no | false | The Superior Proxy feature is designed to tackle the most demanding scraping scenarios, allowing you to overcome the challenges posed by highly restrictive websites. By leveraging our superior proxy option, you can bypass bot detection mechanisms and access data from even the toughest sources. |
auto_proxy | no | false | By default, the Auto Proxy is turned on and will use a proxy for any domain that our team has
identified as requiring one to bypass bot detection. Set the auto_proxy query
parameter to false if you would like to turn the feature off. If you find a domain that
needs a proxy but isn't being automatically used to fetch the data, please contact support, and we
will update our Auto Proxy settings accordingly.
|
max_cache_age | no | 432000000 | This specifies the maximum age in milliseconds that a cached response should be. If not specified the value is set to 5 days. (5 days * 24 hours * 60 minutes _ 60 seconds _ 1000ms = 432,000,000 ms) |
accept_lang | no | en-US,en;q=0.9 auto | This specifies the request language sent when requesting the url. This is useful if you want to get the site for languages other than english. The default setting for this will return an english version of a page if it exists. Note: if you specify the value auto the api will use the same language settings of your current request. For more information on what to supply for this field please see: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Language |
Screenshot Site
const url =
"https://opengraph.io/api/1.1/screenshot/:site?app_id=xxxxxx";
const fetchData = async () => {
try {
const response = await fetch(url);
const data = await response.json();
console.log(data);
} catch (error) {
console.log(error);
}
};
fetchData();
require 'net/http'
require 'json'
url = "https://opengraph.io/api/1.1/screenshot/:site?app_id=xxxxxx"
begin
uri = URI(url)
response = Net::HTTP.get(uri)
data = JSON.parse(response)
puts data
rescue => e
puts e.message
end
using System;
using System.Net.Http;
using Newtonsoft.Json.Linq;
public class Program
{
private static readonly string url = "https://opengraph.io/api/1.1/screenshot/:site?app_id=xxxxxx";
public static async void FetchDataAsync()
{
using (HttpClient client = new HttpClient())
{
try
{
var response = await client.GetStringAsync(url);
var data = JObject.Parse(response);
Console.WriteLine(data);
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
}
}
public static void Main()
{
FetchDataAsync();
Console.ReadLine(); // Keeps the console window open until Enter is pressed
}
}
<?php
$url = "https://opengraph.io/api/1.1/screenshot/:site?app_id=xxxxxx";
try {
$response = file_get_contents($url);
if ($response !== false) {
$data = json_decode($response, true);
print_r($data);
}
} catch (Exception $e) {
echo $e->getMessage();
}
?>
The above command returns JSON structured like this:
{
"message": "Screenshot retrieved successfully",
"screenshotUrl": "Url where the screenshot is stored"
}
HTTP Request
GET /api/1.1/screenshot/:site?app_id=xxxxxx
URL Parameters
Parameter | Required | Description |
---|---|---|
:site | true | This is a required parameter and is an encoded URL of the website from which you want open graph information. |
Query Parameters
Parameter | Required | Default Value | Description |
---|---|---|---|
app_id | yes | - | The API key for registered users. Create an account (no cc ever required) to receive your app_id. |
full_page | no | false | The full_page query parameter determines whether the screenshot should capture the visible viewport or the entire content of the page. |
dimensions | no | md | The dimensions query parameter sets the viewport of the screen. |
quality | no | 80 | The quality query parameter is used to specify the quality of the screenshot. |
Full Page Parameter
The full_page
parameter determines whether the screenshot should capture the visible viewport or
the entire content of the page. The parameter can be set to one of the following values:
- true: Capture the entire content of the page.
- false: Capture the visible viewport.
Dimensions Parameter
The dimensions
parameter specifies the viewport of the screen and can be set to one of the
following values:
lg:
width:
1920height:
1080
md:
width:
1366height:
768
sm:
width:
1024height:
768
xs:
width:
375height:
812
Quality Parameter
The quality
parameter specifies the image quality is set to a value representing different
quality levels. The value should be set in intervals of 10, starting from 10 up to 80, where:
10
is the lowest quality.80
is the highest quality.
Valid values: 10, 20, 30, 40, 50, 60, 70, 80
Embed Site
const url =
"https://opengraph.io/api/1.1/oembed/:site?app_id=xxxxxx";
const fetchData = async () => {
try {
const response = await fetch(url);
const data = await response.json();
console.log(data);
} catch (error) {
console.log(error);
}
};
fetchData();
require 'net/http'
require 'json'
url = "https://opengraph.io/api/1.1/oembed/:site?app_id=xxxxxx"
begin
uri = URI(url)
response = Net::HTTP.get(uri)
data = JSON.parse(response)
puts data
rescue => e
puts e.message
end
using System;
using System.Net.Http;
using Newtonsoft.Json.Linq;
public class Program
{
private static readonly string url = "https://opengraph.io/api/1.1/oembed/:site?app_id=xxxxxx";
public static async void FetchDataAsync()
{
using (HttpClient client = new HttpClient())
{
try
{
var response = await client.GetStringAsync(url);
var data = JObject.Parse(response);
Console.WriteLine(data);
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
}
}
public static void Main()
{
FetchDataAsync();
Console.ReadLine(); // Keeps the console window open until Enter is pressed
}
}
<?php
$url = "https://opengraph.io/api/1.1/oembed/:site?app_id=xxxxxx";
try {
$response = file_get_contents($url);
if ($response !== false) {
$data = json_decode($response, true);
print_r($data);
}
} catch (Exception $e) {
echo $e->getMessage();
}
?>
The above command returns JSON structured like this:
{
"height": "Height",
"width": "Width",
"version": "1.0",
"provider_name": "name of site you are requesting",
"type": "type of embed",
"html": "HTML for iframe."
}
HTTP Request
GET /api/1.1/oembed/:site?app_id=xxxxxx
URL Parameters
Parameter | Required | Description |
---|---|---|
:site | true | This is a required parameter and is an encoded URL of the website which you would like to embed |
Query Parameters
Parameter | Required | Example | Description |
---|---|---|---|
app_id | yes | - | The API key for registered users. Create an account (no cc ever required) to receive your app_id. |
cache_ok | no | false | This will force our servers to pull a fresh version of the site being requested. By default this value is true |
full_render | no | false | This will fully render the site using a chrome browser before parsing its contents. This is especially helpful for single page applications and JS redirects. This will slow down the time it takes to get a response by around 1.5 seconds. |
use_proxy | no | false | Route your request through residential and mobile proxies to avoid bot detection. This will slow down requests 3-10 seconds and can cause requests to time out. NOTE: Proxies are a limited resource and expensive for our team maintain. Free accounts share a small pool of proxies. If you plan on using proxies often, paid accounts provide dedicated concurrent proxies for your account. |
use_premium | no | false | The Premium Proxy feature in our API allows you to leverage residential and mobile proxy pools for enhanced scraping performance. |
use_superior | no | false | The Superior Proxy feature is designed to tackle the most demanding scraping scenarios, allowing you to overcome the challenges posed by highly restrictive websites. By leveraging our superior proxy option, you can bypass bot detection mechanisms and access data from even the toughest sources. |
auto_proxy | no | false | By default, the Auto Proxy is turned on and will use a proxy for any domain that our team has
identified as requiring one to bypass bot detection. Set the auto_proxy query
parameter to false if you would like to turn the feature off. If you find a domain that
needs a proxy but isn't being automatically used to fetch the data, please contact support, and we
will update our Auto Proxy settings accordingly.
|
max_cache_age | no | 432000000 | This specifies the maximum age in milliseconds that a cached response should be. If not specified the value is set to 5 days. (5 days * 24 hours _ 60 minutes _ 60 seconds _ 1000ms = 432,000,000 ms) |
accept_lang | no | en-US,en;q=0.9 auto | This specifies the request language sent when requesting the url. This is useful if you want to get the site for languages other than english. The default setting for this will return an english version of a page if it exists. Note: if you specify the value auto the api will use the same language settings of your current request. For more information on what to supply for this field please see: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Language |
orientation | vertical | vertical / horizontal | This will determine whether the embed will be vertical or horizontal. The default setting is vertical. |
always_og_frame | false | true / false | By default the api will attempt to use the sites native oembed implementation if it exists. If you supply true for this parameter the api will ignore this and always use the opengraph.io frame. |
API Requests
Requests to the API can vary in cost. A cached response will only ever cost 1, while a request requiring javascript execution and a proxy could cost up to 20 requests. We outline the variations in cost below.
How Requests Work
Each billing cycle allows for a specific number of requests. The number of available requests is determined by your active plan. How many requests used on each request can depend on certain query parameters.
Requests Used
You can view how many requests were used in the x-billing-request
response header from the OpenGraph.io API.
Requests Used | Description | |
---|---|---|
request | 1 | A request is a single API call to our system. |
full_render | 10 | full_render is a request that will render the page in a headless browser and return the full HTML of the page. |
use_proxy | 10 | use_proxy will route a request through one of our proxy servers. This is useful for scraping sites that have basic scraping protection. |
use_premium | 20 | use_premium will route a request through one of our premium proxy servers. |
screenshot | 20 | A call to the screenshot API route will be charged at 20 requests. |
use_superior | 30 | use_superior tells opengraph.io to utilize a more advanced premium proxy which bypasses advanced scraping protection employed by sites such as LinkedIn, Amazon, etc. |
Auto Proxy | varies | The Auto Proxy feature instructs the API to automatically use a proxy whenever necessary. This
feature can be turned off by setting the auto_proxy query parameter to
false . If you encounter sites that require a proxy but are not being automatically
scraped with one, please contact support. If a request triggers the proxy, the request will be
billed for the proxy that was required.
|
Auto Proxy
To make things as smooth and simple as possible we have created the Auto Proxy feature to automatically use a proxy for sites with bot detection.
The Auto Proxy feature is on by default for qualifying plans, but can be turned off by setting the auto_proxy
query parameter to false
.
If you discover a site that needs a proxy that our Auto Proxy is not aware of, please feel free to contact support.