Introduction
Welcome to the API documentation for Opengraph.io! With our API, you can easily parse HTML from URLs provided for Open Graph tags.
We have language bindings in NodeJS, Ruby, PHP, C#, and JQuery! You can view code examples in the dark area to the right, and you can switch the programming language of the examples with the tabs in the top right.
Authentication
https://opengraph.io/api/1.1/site/<URL encoded site URL>?app_id=xxxxxx
To authenticate your API requests, simply include your unique app_id query parameter with each request. This app_id serves as your API key and is provided to you upon registration.
To get started with the Opengraph.io API, you'll need to create a free account to receive your app_id. Opengraph.io Signup
Sites
Get Open Graph
const url = "https://opengraph.io/api/1.1/site/:site?app_id=xxxxxx";
async function fetchData() {
try {
const response = await fetch(url);
const data = await response.json();
console.log(data);
} catch (error) {
console.error(error);
}
}
fetchData();
require 'net/http'
require 'json'
url = URI("https://opengraph.io/api/1.1/site/:site?app_id=xxxxxx")
http = Net::HTTP.new(url.host, url.port)
http.use_ssl = true
request = Net::HTTP::Get.new(url)
response = http.request(request)
data = JSON.parse(response.body)
puts data
using System;
using System.Net.Http;
using System.Threading.Tasks;
using Newtonsoft.Json;
class Program
{
static async Task Main(string[] args)
{
var url = "https://opengraph.io/api/1.1/site/:site?app_id=xxxxxx";
using (var httpClient = new HttpClient())
{
using (var response = await httpClient.GetAsync(url))
{
string apiResponse = await response.Content.ReadAsStringAsync();
dynamic data = JsonConvert.DeserializeObject(apiResponse);
Console.WriteLine(data);
}
}
}
}
$url = 'https://opengraph.io/api/1.1/site/:site?app_id=xxxxxx';
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($curl);
$data = json_decode($response, true);
print_r($data);
The API returns JSON structured like this:
{
"hybridGraph": {
"title": "Example Title",
"description": "Example Description",
"type": "Example Type",
"image": "https://example.com/image.png",
"url": "https://example.com",
"favicon": "https://example.com/favicon.ico",
"site_name": "Example Site Name",
"articlePublishedTime": "2023-03-23T00:00:00.000Z",
"articleAuthor": "https://example.com/author"
},
"openGraph": {
"title": "Example Title",
"description": "Example Description",
"type": "Example Type",
"image": {
"url": "https://example.com/image.png"
},
"url": "https://example.com",
"site_name": "Example Site Name",
"articlePublishedTime": "2023-03-23T00:00:00.000Z",
"articleAuthor": "https://example.com/author"
},
"htmlInferred": {
"title": "Example Title",
"description": "Example Description",
"type": "Example Type",
"image": "https://example.com/image.png",
"url": "https://example.com",
"favicon": "https://example.com/favicon.ico",
"site_name": "Example Site Name",
"images": [
"https://example.com/image1.png",
"https://example.com/image2.png",
"https://example.com/image3.png",
"https://example.com/image4.png"
]
},
"requestInfo": {
"redirects": 1,
"host": "https://example.com",
"responseCode": 200,
"cache_ok": true,
"max_cache_age": 432000000,
"accept_lang": "en-US,en;q=0.9",
"url": "https://example.com",
"full_render": false,
"use_proxy": false,
"use_superior" : false,
"responseContentType": "text/html; charset=utf-8"
},
"accept_lang": "en-US,en;q=0.9",
"is_cache": false,
"url": "https://example.com"
}
HTTP Request
GET https://opengraph.io/api/1.1/site/<URL encoded site URL>?app_id=xxxxxx
URL Parameters
Parameter | Required | Description |
---|---|---|
:site | true | This is a required parameter and is an encoded URL of the website from which you want open graph information. |
Query Parameters
Parameter | Required | Example | Description |
---|---|---|---|
app_id | yes | - | The API key for registered users. Create an account (no cc ever required) to receive your app_id. |
cache_ok | no | false | This will force our servers to pull a fresh version of the site being requested. By default this value is true |
full_render | no | false | This will fully render the site using a chrome browser before parsing its contents. This is especially helpful for single page applications and JS redirects. This will slow down the time it takes to get a response by around 1.5 seconds. |
use_proxy | no | false | Route your request through residential and mobile proxies to avoid bot detection. This will slow down requests 3-10 seconds and can cause requests to time out. NOTE: Proxies are a limited resource and expensive for our team maintain. Free accounts share a small pool of proxies. If you plan on using proxies often, paid accounts provide dedicated concurrent proxies for your account. |
use_superior | no | false | The superior proxy feature is designed to tackle the most demanding scraping scenarios, allowing you to overcome the challenges posed by highly restrictive websites. By leveraging our superior proxy option, you can bypass bot detection mechanisms and access data from even the toughest sources. |
max_cache_age | no | 432000000 | This specifies the maximum age in milliseconds that a cached response should be. If not specified the value is set to 5 days. (5 days * 24 hours _ 60 minutes _ 60 seconds _ 1000ms = 432,000,000 ms) |
accept_lang | no | en-US,en;q=0.9 auto | This specifies the request language sent when requesting the url. This is useful if you want to get the site for languages other than english. The default setting for this will return an english version of a page if it exists. Note: if you specify the value auto the api will use the same language settings of your current request. For more information on what to supply for this field please see: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Language |
Extract Site
const url =
"https://opengraph.io/api/1.1/extract/:site?app_id=xxxxxx&html_elements=title,h1,h2,h3,h4,p";
const fetchData = async () => {
try {
const response = await fetch(url);
const data = await response.json();
console.log(data);
} catch (error) {
console.log(error);
}
};
fetchData();
require 'net/http'
require 'json'
url = URI.parse("https://opengraph.io/api/1.1/extract/:site?app_id=xxxxxx&html_elements=title,h1,h2,h3,h4,p")
response = Net::HTTP.get_response(url)
data = JSON.parse(response.body)
puts data
using System;
using System.Net.Http;
using Newtonsoft.Json;
class Program
{
static async System.Threading.Tasks.Task Main(string[] args)
{
var url = "https://opengraph.io/api/1.1/extract/:site?app_id=xxxxxx&html_elements=title,h1,h2,h3,h4,p";
var httpClient = new HttpClient();
var response = await httpClient.GetAsync(url);
var jsonResponse = await response.Content.ReadAsStringAsync();
dynamic data = JsonConvert.DeserializeObject(jsonResponse);
Console.WriteLine(data);
}
}
$url = 'https://opengraph.io/api/1.1/extract/:site?app_id=xxxxxx&html_elements=title,h1,h2,h3,h4,p';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
curl_close($ch);
$data = json_decode($result, true);
print_r($data);
The above command returns JSON structured like this:
{
"tags": [
{
"tag": "title",
"innerText": "example innerText",
"position": 0
},
{
"tag": "h1",
"innerText": "example innerText",
"position": 1
},
{
"tag": "h2",
"innerText": "example innerText",
"position": 2
},
{
"tag": "h2",
"innerText": "example innerText",
"position": 3
},
{
"tag": "h2",
"innerText": "example innerText",
"position": 4
},
{
"tag": "h3",
"innerText": "example innerText",
"position": 5
},
{
"tag": "h3",
"innerText": "example innerText",
"position": 6
},
{
"tag": "h4",
"innerText": "example innerText",
"position": 7
},
{
"tag": "p",
"innerText": "example innerText",
"position": 8
}
],
"concatenatedText": "The concatenatedText property represents the combined text content of all the tags associated with the object, merged into a single string. This property provides a convenient way to access and manipulate the entire text content of the object at once, rather than having to iterate through each tag individually."
}
The extract endpoint enables you to extract information from any website by providing its URL. With this endpoint, you can extract any element you need from the website, including but not limited to the title, header elements (h1 to h5), and paragraph elements (p).
HTTP Request
GET /api/1.1/extract/:site?app_id=xxxxxx
URL Parameters
Parameter | Required | Description |
---|---|---|
:site | true | This is a required parameter and is an encoded URL of the website from which you want open graph information. |
Query Parameters
Parameter | Required | Example | Description |
---|---|---|---|
app_id | yes | - | The API key for registered users. Create an account (no cc ever required) to receive your app_id. |
html_elements | no | h1,h2,h3,p,span | This is an optional parameter and specifies the HTML elements you want to extract from the website. The value should be a comma-separated list of HTML element names. If this parameter is not supplied, the default elements that will be extracted are h1, h2, h3, h4, h5, p, and title. |
cache_ok | no | false | This will force our servers to pull a fresh version of the site being requested. By default this value is true |
full_render | no | false | This will fully render the site using a chrome browser before parsing its contents. This is especially helpful for single page applications and JS redirects. This will slow down the time it takes to get a response by around 1.5 seconds. |
use_proxy | no | false | Route your request through residential and mobile proxies to avoid bot detection. This will slow down requests 3-10 seconds and can cause requests to time out. NOTE: Proxies are a limited resource and expensive for our team maintain. Free accounts share a small pool of proxies. If you plan on using proxies often, paid accounts provide dedicated concurrent proxies for your account. |
use_superior | no | false | The superior proxy feature is designed to tackle the most demanding scraping scenarios, allowing you to overcome the challenges posed by highly restrictive websites. By leveraging our superior proxy option, you can bypass bot detection mechanisms and access data from even the toughest sources. |
max_cache_age | no | 432000000 | This specifies the maximum age in milliseconds that a cached response should be. If not specified the value is set to 5 days. (5 days * 24 hours _ 60 minutes _ 60 seconds _ 1000ms = 432,000,000 ms) |
accept_lang | no | en-US,en;q=0.9 auto | This specifies the request language sent when requesting the url. This is useful if you want to get the site for languages other than english. The default setting for this will return an english version of a page if it exists. Note: if you specify the value auto the api will use the same language settings of your current request. For more information on what to supply for this field please see: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Language |
Note
The response will only include the elements specified in the html_elements query or the default elements if the query is not supplied.
If the website does not contain any of the specified elements, the corresponding keys in the response will be empty lists.
Scrape Site
const url =
"https://opengraph.io/api/1.1/scrape/:site?app_id=xxxxxx&html_elements=title,h1,h2,h3,h4,p";
const fetchData = async () => {
try {
const response = await fetch(url);
const data = await response.json();
console.log(data);
} catch (error) {
console.log(error);
}
};
fetchData();
require 'net/http'
require 'json'
url = URI.parse("https://opengraph.io/api/1.1/scrape/:site?app_id=xxxxxx&html_elements=title,h1,h2,h3,h4,p")
response = Net::HTTP.get_response(url)
data = JSON.parse(response.body)
puts data
using System;
using System.Net.Http;
using Newtonsoft.Json;
class Program
{
static async System.Threading.Tasks.Task Main(string[] args)
{
var url = "https://opengraph.io/api/1.1/scrape/:site?app_id=xxxxxx&html_elements=title,h1,h2,h3,h4,p";
var httpClient = new HttpClient();
var response = await httpClient.GetAsync(url);
var jsonResponse = await response.Content.ReadAsStringAsync();
dynamic data = JsonConvert.DeserializeObject(jsonResponse);
Console.WriteLine(data);
}
}
$url = 'https://opengraph.io/api/1.1/scrape/:site?app_id=xxxxxx&html_elements=title,h1,h2,h3,h4,p';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
curl_close($ch);
$data = json_decode($result, true);
print_r($data);
The above command returns the Raw HTML of a Web Page:
<html>
...
</html>
Just need the raw HTML?
The Scrape Site endpoint is used to scrape the HTML of a website given its URL
HTTP Request
GET https://opengraph.io/api/1.1/scrape/:site?app_id=xxxxxx
URL Parameters
Parameter | Required | Description |
---|---|---|
:site | true | This is a required parameter and is an encoded URL of the website from which you want open graph information. |
Query Parameters
Parameter | Required | Example | Description |
---|---|---|---|
app_id | yes | - | The API key for registered users. Create an account (no cc ever required) to receive your app_id. |
cache_ok | no | false | This will force our servers to pull a fresh version of the site being requested. By default this value is true |
full_render | no | false | This will fully render the site using a chrome browser before parsing its contents. This is especially helpful for single page applications and JS redirects. This will slow down the time it takes to get a response by around 1.5 seconds. |
use_proxy | no | false | Route your request through residential and mobile proxies to avoid bot detection. This will slow down requests 3-10 seconds and can cause requests to time out. NOTE: Proxies are a limited resource and expensive for our team maintain. Free accounts share a small pool of proxies. If you plan on using proxies often, paid accounts provide dedicated concurrent proxies for your account. |
use_superior | no | false | The superior proxy feature is designed to tackle the most demanding scraping scenarios, allowing you to overcome the challenges posed by highly restrictive websites. By leveraging our superior proxy option, you can bypass bot detection mechanisms and access data from even the toughest sources. |
max_cache_age | no | 432000000 | This specifies the maximum age in milliseconds that a cached response should be. If not specified the value is set to 5 days. (5 days * 24 hours * 60 minutes _ 60 seconds _ 1000ms = 432,000,000 ms) |
accept_lang | no | en-US,en;q=0.9 auto | This specifies the request language sent when requesting the url. This is useful if you want to get the site for languages other than english. The default setting for this will return an english version of a page if it exists. Note: if you specify the value auto the api will use the same language settings of your current request. For more information on what to supply for this field please see: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Language |
Screenshot Site
const url =
"https://opengraph.io/api/1.1/screenshot/:site?app_id=xxxxxx";
const fetchData = async () => {
try {
const response = await fetch(url);
const data = await response.json();
console.log(data);
} catch (error) {
console.log(error);
}
};
fetchData();
require 'net/http'
require 'json'
url = "https://opengraph.io/api/1.1/screenshot/:site?app_id=xxxxxx"
begin
uri = URI(url)
response = Net::HTTP.get(uri)
data = JSON.parse(response)
puts data
rescue => e
puts e.message
end
using System;
using System.Net.Http;
using Newtonsoft.Json.Linq;
public class Program
{
private static readonly string url = "https://opengraph.io/api/1.1/screenshot/:site?app_id=xxxxxx";
public static async void FetchDataAsync()
{
using (HttpClient client = new HttpClient())
{
try
{
var response = await client.GetStringAsync(url);
var data = JObject.Parse(response);
Console.WriteLine(data);
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
}
}
public static void Main()
{
FetchDataAsync();
Console.ReadLine(); // Keeps the console window open until Enter is pressed
}
}
<?php
$url = "https://opengraph.io/api/1.1/screenshot/:site?app_id=xxxxxx";
try {
$response = file_get_contents($url);
if ($response !== false) {
$data = json_decode($response, true);
print_r($data);
}
} catch (Exception $e) {
echo $e->getMessage();
}
?>
The above command returns JSON structured like this:
{
"message": "Screenshot retrieved successfully",
"screenshotUrl": "Url where the screenshot is stored"
}
HTTP Request
GET /api/1.1/screenshot/:site?app_id=xxxxxx
URL Parameters
Parameter | Required | Description |
---|---|---|
:site | true | This is a required parameter and is an encoded URL of the website from which you want open graph information. |
Query Parameters
Parameter | Required | Default Value | Description |
---|---|---|---|
app_id | yes | - | The API key for registered users. Create an account (no cc ever required) to receive your app_id. |
full_page | no | false | The full_page query parameter determines whether the screenshot should capture the visible viewport or the entire content of the page. |
dimensions | no | md | The dimensions query parameter sets the viewport of the screen. |
quality | no | 80 | The quality query parameter is used to specify the quality of the screenshot. |
Full Page Parameter
The full_page
parameter determines whether the screenshot should capture the visible viewport or the entire content of the page. The parameter can be set to one of the following values:
- true: Capture the entire content of the page.
- false: Capture the visible viewport.
Dimensions Parameter
The dimensions
parameter specifies the viewport of the screen and can be set to one of the following values:
lg:
width:
1920height:
1080
md:
width:
1366height:
768
sm:
width:
1024height:
768
xs:
width:
375height:
812
Quality Parameter
The quality
parameter specifies the image quality is set to a value representing different quality levels. The value should be set in intervals of 10, starting from 10 up to 80, where:
10
is the lowest quality.80
is the highest quality.
Valid values: 10, 20, 30, 40, 50, 60, 70, 80
API Credits
The API Credit system is a flexible and transparent way to manage usage and costs for our API services. You can easily track and control your usage based on the number of credits consumed by each API request.
How Credits Work
When you send a request, it utilizes credits from your allocated quota. The number of available credits is determined by your active plan. How many credits used on each request will depend on certain query parameters.
API Credit Costs
Cost | Description | |
---|---|---|
request | 1 | A request is a single API call to our system. |
full_render | 10 | full_render is a request that will render the page in a headless browser and return the full HTML of the page. |
use_proxy | 10 | use_proxy will route a request through one of our proxy servers. This is useful for scraping sites that have basic scraping protection. |
use_premium | 10 | use_premium will route a request through one of our premium proxy servers. |
screenshot | 20 | A call to the screenshot API route will cost 10 API Credits. |
use_superior | 30 | use_superior tells opengraph.io to utilize a more advanced premium proxy which bypasses advanced scraping protection employed by sites such as LinkedIn, Amazon, etc. |