Scraping Site Metadata with Node

Justin Furniss

February 22, 2020

2 min read

Woman's hands typing on a Macbook Pro.

Getting started with scraping metadata from websites using NodeJS is pretty easy, especially for newer sites. The difficulty comes once you want to start supporting the long tail of older or non-standard sites out there. Supporting the long-tail is where OpenGraph.io can help.

First, lets init a new node project:

npm init

Next, we'll install the opengraph-io NPM module:

npm install opengraph-io

Finally, lets scrape some websites using our node app:

var opengraph = require('opengraph-io')({appId: 'xxxxxx'}); // <-- Enter your app_id! var express = require('express'); var app = express(); app.get('/site/info', function (req, res) { var siteUrl = req.query['url']; opengraph.getSiteInfo(siteUrl, function(err, siteInfo){ console.log('hey err', err); console.log('hey result', siteInfo); res.json(siteInfo); }); }); app.listen(3000, function() { console.log('Example app listening on port 3000!'); console.log('Test this proxy with the following url:', 'http://localhost:3000/site/info?url=https%3A%2F%2Fnytimes.com'); });

You'll notice that Opengraph.io will return OpenGraph tags when they are available. If any (or all) tags are not provided on a site, OpenGraph.io will infer what the OpenGraph tags probably should be. It doesn't get much easier than that.

Since OpenGraph.io is a hosted service which is free for most users. You won't have to worry about scaling infrastructure or working on supporting that long tail of projects. Anytime you come across a site that isn't well supported, just drop us a line.

Back to the Blog