From Good to Great: Enhancing OpenGraph.io Product Scraping With Our Own ‘Extract’ API

Blake Archer

September 5, 2023

5 min

Cartoon graphic hands typing and showing the cycle of online product buying.

Introduction

In today's digital age, data is gold. But accessing this treasure isn't always straightforward. Every online business, at some point, confronts the intricate web of data extraction. But here's a nugget of wisdom - there's always a way around challenges. Enter OpenGraph.io, your friendly companion in navigating the maze of web scraping.

OpenGraph.io's Dedication to Excellence: Enhancing with the ‘Extract’ API

At OpenGraph.io, we're a bit obsessed (in a good way!) with perfection. We roll up our sleeves and dive deep, updating our scrapers daily so you can focus on what you do best. Recently, in our quest for betterment, we plunged into the world of product scraping. Our unfurling endpoint already does such a good job in retrieving product information, but we are always looking for ways to improve. So I set out to find the biggest online retailers. In doing so I stumbled on some intriguing articles about the 'Top 40 Biggest Online Retailers', extracting meaningful insights from them posed a challenge. Manually going back and forth between google sheets and the articles to keep track of what product information we are accurately getting, although exhilarating, was not exactly my idea of a good time.

Confronting Challenges: The 'Extract' Endpoint to the Rescue

Now, instead of getting overwhelmed, I turned to our very own ‘Extract’ endpoint. I was able to use the developer tools to quickly identify the elements that the retailers names’ were under. Then I could just send along the elements I needed to grab in a quick query to our API.

const axios = require('axios');

let config = {
  method: 'get',
  url: 'https://opengraph.io/api/1.1/extract/https%3A%2F%2Finfluencermarketinghub.com%2Fecommerce-companies-usa%2F?accept_lang=auto&app_id=••••••&html_elements=h2>a',
  };
axios.request(config)
.then((response) => {
  console.log(JSON.stringify(response.data));
})
.catch((error) => {
  console.log(error);
});

Below is a snippet of the response from the above code.

A screenshot of the response from the Extract endpoint.

The 'Extract' route also provides a concatenatedText property that combines all the information into a single string.

A screenshot of the concatenatedText property in the response from the Extract endpoint.

Using the information we got back from the OpenGraph.io ‘Extract’ API endpoint, I was able to systematically and efficiently go through the web’s most popular online retailers to figure out the data we were getting, and more importantly missing, to refine OpenGraph.io even further. From Product descriptions and Prices to related products I realized that while many sites vary wildly, I started noticing glaring similarities that we could use to enhance the unfurling API endpoint to consistently get more and more product information.

Unraveling OpenGraph.io's Extract Endpoint

Web scraping can be immensely time-consuming and challenging. After investing significant effort, numerous variables remain unpredictable. Like a newborn, constant attention and adaptation are required. Fortunately, we love handling the complexities so you can remain worry-free.

Conclusion: The OpenGraph.io Difference

In summation, this has been our journey from identifying a challenge to transforming it into a development opportunity. With OpenGraph.io, we aspire to redefine the web scraping landscape. We represent the confluence of efficiency, data collection, and stellar customer service. We trust you'll perceive OpenGraph.io as transformative as we do.