NAV

Opengraph.io

javascript ruby csharp php

Introduction

Welcome to the API documentation for Opengraph.io! With our API, you can easily parse HTML from URLs provided for Open Graph tags.

We have language bindings in NodeJS, Ruby, PHP, C#, and JQuery! You can view code examples in the dark area to the right, and you can switch the programming language of the examples with the tabs in the top right.

Authentication

https://opengraph.io/api/1.1/site/<URL encoded site URL>?app_id=xxxxxx

To authenticate your API requests, simply include your unique app_id query parameter with each request. This app_id serves as your API key and is provided to you upon registration.

To get started with the Opengraph.io API, you'll need to create a free account to receive your app_id. Opengraph.io Signup

Sites

Get Open Graph

const url = "https://opengraph.io/api/1.1/site/:site?app_id=xxxxxx";

async function fetchData() {
  try {
    const response = await fetch(url);
    const data = await response.json();
    console.log(data);
  } catch (error) {
    console.error(error);
  }
}

fetchData();
require 'net/http'
require 'json'

url = URI("https://opengraph.io/api/1.1/site/:site?app_id=xxxxxx")

http = Net::HTTP.new(url.host, url.port)
http.use_ssl = true

request = Net::HTTP::Get.new(url)

response = http.request(request)

data = JSON.parse(response.body)

puts data
using System;
using System.Net.Http;
using System.Threading.Tasks;
using Newtonsoft.Json;

class Program
{
    static async Task Main(string[] args)
    {
        var url = "https://opengraph.io/api/1.1/site/:site?app_id=xxxxxx";

        using (var httpClient = new HttpClient())
        {
            using (var response = await httpClient.GetAsync(url))
            {
                string apiResponse = await response.Content.ReadAsStringAsync();
                dynamic data = JsonConvert.DeserializeObject(apiResponse);
                Console.WriteLine(data);
            }
        }
    }
}
$url = 'https://opengraph.io/api/1.1/site/:site?app_id=xxxxxx';

$curl = curl_init($url);

curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($curl);

$data = json_decode($response, true);

print_r($data);

The API returns JSON structured like this:

{
  "hybridGraph": {
    "title": "Example Title",
    "description": "Example Description",
    "type": "Example Type",
    "image": "https://example.com/image.png",
    "url": "https://example.com",
    "favicon": "https://example.com/favicon.ico",
    "site_name": "Example Site Name",
    "articlePublishedTime": "2023-03-23T00:00:00.000Z",
    "articleAuthor": "https://example.com/author"
  },
  "openGraph": {
    "title": "Example Title",
    "description": "Example Description",
    "type": "Example Type",
    "image": {
      "url": "https://example.com/image.png"
    },
    "url": "https://example.com",
    "site_name": "Example Site Name",
    "articlePublishedTime": "2023-03-23T00:00:00.000Z",
    "articleAuthor": "https://example.com/author"
  },
  "htmlInferred": {
    "title": "Example Title",
    "description": "Example Description",
    "type": "Example Type",
    "image": "https://example.com/image.png",
    "url": "https://example.com",
    "favicon": "https://example.com/favicon.ico",
    "site_name": "Example Site Name",
    "images": [
      "https://example.com/image1.png",
      "https://example.com/image2.png",
      "https://example.com/image3.png",
      "https://example.com/image4.png"
    ]
  },
  "requestInfo": {
    "redirects": 1,
    "host": "https://example.com",
    "responseCode": 200,
    "cache_ok": true,
    "max_cache_age": 432000000,
    "accept_lang": "en-US,en;q=0.9",
    "url": "https://example.com",
    "full_render": false,
    "use_proxy": false,
    "use_superior" : false,
    "responseContentType": "text/html; charset=utf-8"
  },
  "accept_lang": "en-US,en;q=0.9",
  "is_cache": false,
  "url": "https://example.com"
}

HTTP Request

GET https://opengraph.io/api/1.1/site/<URL encoded site URL>?app_id=xxxxxx

URL Parameters

Parameter Required Description
:site true This is a required parameter and is an encoded URL of the website from which you want open graph information.

Query Parameters

Parameter Required Example Description
app_id yes - The API key for registered users. Create an account (no cc ever required) to receive your app_id.
cache_ok no false This will force our servers to pull a fresh version of the site being requested. By default this value is true
full_render no false This will fully render the site using a chrome browser before parsing its contents. This is especially helpful for single page applications and JS redirects. This will slow down the time it takes to get a response by around 1.5 seconds.
use_proxy no false Route your request through residential and mobile proxies to avoid bot detection. This will slow down requests 3-10 seconds and can cause requests to time out. NOTE: Proxies are a limited resource and expensive for our team maintain. Free accounts share a small pool of proxies. If you plan on using proxies often, paid accounts provide dedicated concurrent proxies for your account.
use_superior no false The superior proxy feature is designed to tackle the most demanding scraping scenarios, allowing you to overcome the challenges posed by highly restrictive websites. By leveraging our superior proxy option, you can bypass bot detection mechanisms and access data from even the toughest sources.
max_cache_age no 432000000 This specifies the maximum age in milliseconds that a cached response should be. If not specified the value is set to 5 days. (5 days * 24 hours _ 60 minutes _ 60 seconds _ 1000ms = 432,000,000 ms)
accept_lang no en-US,en;q=0.9 auto This specifies the request language sent when requesting the url. This is useful if you want to get the site for languages other than english. The default setting for this will return an english version of a page if it exists. Note: if you specify the value auto the api will use the same language settings of your current request. For more information on what to supply for this field please see: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Language

Extract Site

const url =
  "https://opengraph.io/api/1.1/extract/:site?app_id=xxxxxx&html_elements=title,h1,h2,h3,h4,p";

const fetchData = async () => {
  try {
    const response = await fetch(url);
    const data = await response.json();
    console.log(data);
  } catch (error) {
    console.log(error);
  }
};

fetchData();
require 'net/http'
require 'json'

url = URI.parse("https://opengraph.io/api/1.1/extract/:site?app_id=xxxxxx&html_elements=title,h1,h2,h3,h4,p")
response = Net::HTTP.get_response(url)
data = JSON.parse(response.body)

puts data

using System;
using System.Net.Http;
using Newtonsoft.Json;

class Program
{
    static async System.Threading.Tasks.Task Main(string[] args)
    {
        var url = "https://opengraph.io/api/1.1/extract/:site?app_id=xxxxxx&html_elements=title,h1,h2,h3,h4,p";

        var httpClient = new HttpClient();
        var response = await httpClient.GetAsync(url);

        var jsonResponse = await response.Content.ReadAsStringAsync();
        dynamic data = JsonConvert.DeserializeObject(jsonResponse);

        Console.WriteLine(data);
    }
}

$url = 'https://opengraph.io/api/1.1/extract/:site?app_id=xxxxxx&html_elements=title,h1,h2,h3,h4,p';
$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

$result = curl_exec($ch);

curl_close($ch);

$data = json_decode($result, true);

print_r($data);

The above command returns JSON structured like this:

{
  "tags": [
    {
      "tag": "title",
      "innerText": "example innerText",
      "position": 0
    },
    {
      "tag": "h1",
      "innerText": "example innerText",
      "position": 1
    },
    {
      "tag": "h2",
      "innerText": "example innerText",
      "position": 2
    },
    {
      "tag": "h2",
      "innerText": "example innerText",
      "position": 3
    },
    {
      "tag": "h2",
      "innerText": "example innerText",
      "position": 4
    },
    {
      "tag": "h3",
      "innerText": "example innerText",
      "position": 5
    },
    {
      "tag": "h3",
      "innerText": "example innerText",
      "position": 6
    },
    {
      "tag": "h4",
      "innerText": "example innerText",
      "position": 7
    },
    {
      "tag": "p",
      "innerText": "example innerText",
      "position": 8
    }
  ],
  "concatenatedText": "The concatenatedText property represents the combined text content of all the tags associated with the object, merged into a single string. This property provides a convenient way to access and manipulate the entire text content of the object at once, rather than having to iterate through each tag individually."
}

The extract endpoint enables you to extract information from any website by providing its URL. With this endpoint, you can extract any element you need from the website, including but not limited to the title, header elements (h1 to h5), and paragraph elements (p).

HTTP Request

GET /api/1.1/extract/:site?app_id=xxxxxx

URL Parameters

Parameter Required Description
:site true This is a required parameter and is an encoded URL of the website from which you want open graph information.

Query Parameters

Parameter Required Example Description
app_id yes - The API key for registered users. Create an account (no cc ever required) to receive your app_id.
html_elements no h1,h2,h3,p,span This is an optional parameter and specifies the HTML elements you want to extract from the website. The value should be a comma-separated list of HTML element names. If this parameter is not supplied, the default elements that will be extracted are h1, h2, h3, h4, h5, p, and title.
cache_ok no false This will force our servers to pull a fresh version of the site being requested. By default this value is true
full_render no false This will fully render the site using a chrome browser before parsing its contents. This is especially helpful for single page applications and JS redirects. This will slow down the time it takes to get a response by around 1.5 seconds.
use_proxy no false Route your request through residential and mobile proxies to avoid bot detection. This will slow down requests 3-10 seconds and can cause requests to time out. NOTE: Proxies are a limited resource and expensive for our team maintain. Free accounts share a small pool of proxies. If you plan on using proxies often, paid accounts provide dedicated concurrent proxies for your account.
use_superior no false The superior proxy feature is designed to tackle the most demanding scraping scenarios, allowing you to overcome the challenges posed by highly restrictive websites. By leveraging our superior proxy option, you can bypass bot detection mechanisms and access data from even the toughest sources.
max_cache_age no 432000000 This specifies the maximum age in milliseconds that a cached response should be. If not specified the value is set to 5 days. (5 days * 24 hours _ 60 minutes _ 60 seconds _ 1000ms = 432,000,000 ms)
accept_lang no en-US,en;q=0.9 auto This specifies the request language sent when requesting the url. This is useful if you want to get the site for languages other than english. The default setting for this will return an english version of a page if it exists. Note: if you specify the value auto the api will use the same language settings of your current request. For more information on what to supply for this field please see: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Language

Note

The response will only include the elements specified in the html_elements query or the default elements if the query is not supplied.

If the website does not contain any of the specified elements, the corresponding keys in the response will be empty lists.

Scrape Site

const url =
  "https://opengraph.io/api/1.1/scrape/:site?app_id=xxxxxx&html_elements=title,h1,h2,h3,h4,p";

const fetchData = async () => {
  try {
    const response = await fetch(url);
    const data = await response.json();
    console.log(data);
  } catch (error) {
    console.log(error);
  }
};

fetchData();
require 'net/http'
require 'json'

url = URI.parse("https://opengraph.io/api/1.1/scrape/:site?app_id=xxxxxx&html_elements=title,h1,h2,h3,h4,p")
response = Net::HTTP.get_response(url)
data = JSON.parse(response.body)

puts data

using System;
using System.Net.Http;
using Newtonsoft.Json;

class Program
{
    static async System.Threading.Tasks.Task Main(string[] args)
    {
        var url = "https://opengraph.io/api/1.1/scrape/:site?app_id=xxxxxx&html_elements=title,h1,h2,h3,h4,p";

        var httpClient = new HttpClient();
        var response = await httpClient.GetAsync(url);

        var jsonResponse = await response.Content.ReadAsStringAsync();
        dynamic data = JsonConvert.DeserializeObject(jsonResponse);

        Console.WriteLine(data);
    }
}

$url = 'https://opengraph.io/api/1.1/scrape/:site?app_id=xxxxxx&html_elements=title,h1,h2,h3,h4,p';
$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

$result = curl_exec($ch);

curl_close($ch);

$data = json_decode($result, true);

print_r($data);

The above command returns the Raw HTML of a Web Page:

<html>
  ...
</html>

Just need the raw HTML?

The Scrape Site endpoint is used to scrape the HTML of a website given its URL

HTTP Request

GET https://opengraph.io/api/1.1/scrape/:site?app_id=xxxxxx

URL Parameters

Parameter Required Description
:site true This is a required parameter and is an encoded URL of the website from which you want open graph information.

Query Parameters

Parameter Required Example Description
app_id yes - The API key for registered users. Create an account (no cc ever required) to receive your app_id.
cache_ok no false This will force our servers to pull a fresh version of the site being requested. By default this value is true
full_render no false This will fully render the site using a chrome browser before parsing its contents. This is especially helpful for single page applications and JS redirects. This will slow down the time it takes to get a response by around 1.5 seconds.
use_proxy no false Route your request through residential and mobile proxies to avoid bot detection. This will slow down requests 3-10 seconds and can cause requests to time out. NOTE: Proxies are a limited resource and expensive for our team maintain. Free accounts share a small pool of proxies. If you plan on using proxies often, paid accounts provide dedicated concurrent proxies for your account.
use_superior no false The superior proxy feature is designed to tackle the most demanding scraping scenarios, allowing you to overcome the challenges posed by highly restrictive websites. By leveraging our superior proxy option, you can bypass bot detection mechanisms and access data from even the toughest sources.
max_cache_age no 432000000 This specifies the maximum age in milliseconds that a cached response should be. If not specified the value is set to 5 days. (5 days * 24 hours * 60 minutes _ 60 seconds _ 1000ms = 432,000,000 ms)
accept_lang no en-US,en;q=0.9 auto This specifies the request language sent when requesting the url. This is useful if you want to get the site for languages other than english. The default setting for this will return an english version of a page if it exists. Note: if you specify the value auto the api will use the same language settings of your current request. For more information on what to supply for this field please see: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Language

Screenshot Site

const url =
  "https://opengraph.io/api/1.1/screenshot/:site?app_id=xxxxxx";

const fetchData = async () => {
  try {
    const response = await fetch(url);
    const data = await response.json();
    console.log(data);
  } catch (error) {
    console.log(error);
  }
};

fetchData();
require 'net/http'
require 'json'

url = "https://opengraph.io/api/1.1/screenshot/:site?app_id=xxxxxx"

begin
  uri = URI(url)
  response = Net::HTTP.get(uri)
  data = JSON.parse(response)
  puts data
rescue => e
  puts e.message
end
using System;
using System.Net.Http;
using Newtonsoft.Json.Linq;

public class Program
{
    private static readonly string url = "https://opengraph.io/api/1.1/screenshot/:site?app_id=xxxxxx";

    public static async void FetchDataAsync()
    {
        using (HttpClient client = new HttpClient())
        {
            try
            {
                var response = await client.GetStringAsync(url);
                var data = JObject.Parse(response);
                Console.WriteLine(data);
            }
            catch (Exception e)
            {
                Console.WriteLine(e.Message);
            }
        }
    }

    public static void Main()
    {
        FetchDataAsync();
        Console.ReadLine(); // Keeps the console window open until Enter is pressed
    }
}
<?php

$url = "https://opengraph.io/api/1.1/screenshot/:site?app_id=xxxxxx";

try {
    $response = file_get_contents($url);
    if ($response !== false) {
        $data = json_decode($response, true);
        print_r($data);
    }
} catch (Exception $e) {
    echo $e->getMessage();
}

?>

The above command returns JSON structured like this:

{
  "message": "Screenshot retrieved successfully",
  "screenshotUrl": "Url where the screenshot is stored"
}

HTTP Request

GET /api/1.1/screenshot/:site?app_id=xxxxxx

URL Parameters

Parameter Required Description
:site true This is a required parameter and is an encoded URL of the website from which you want open graph information.

Query Parameters

Parameter Required Default Value Description
app_id yes - The API key for registered users. Create an account (no cc ever required) to receive your app_id.
full_page no false The full_page query parameter determines whether the screenshot should capture the visible viewport or the entire content of the page.
dimensions no md The dimensions query parameter sets the viewport of the screen.
quality no 80 The quality query parameter is used to specify the quality of the screenshot.

Full Page Parameter

The full_page parameter determines whether the screenshot should capture the visible viewport or the entire content of the page. The parameter can be set to one of the following values:

Dimensions Parameter

The dimensions parameter specifies the viewport of the screen and can be set to one of the following values:

Quality Parameter

The quality parameter specifies the image quality is set to a value representing different quality levels. The value should be set in intervals of 10, starting from 10 up to 80, where:

Valid values: 10, 20, 30, 40, 50, 60, 70, 80

API Requests

Requests to the API can vary in cost. A cached response will only ever cost 1, while a request requiring javascript execution and a proxy could cost up to 20 requests. We outline the variations in cost below.

How Requests Work

Each billing cycle allows for a specific number of requests. The number of available requests is determined by your active plan. How many requests used on each request can depend on certain query parameters.

Requests Used

Requests Used Description
request 1 A request is a single API call to our system.
full_render 10 full_render is a request that will render the page in a headless browser and return the full HTML of the page.
use_proxy 10 use_proxy will route a request through one of our proxy servers. This is useful for scraping sites that have basic scraping protection.
use_premium 20 use_premium will route a request through one of our premium proxy servers.
screenshot 20 A call to the screenshot API route will be charged at 20 requests.
use_superior 30 use_superior tells opengraph.io to utilize a more advanced premium proxy which bypasses advanced scraping protection employed by sites such as LinkedIn, Amazon, etc.