Dabbling with Puppeteer

Posted on 2023-04-27 Edited on 2023-06-18 In Puppeteer

Puppeteer is a Node.js library that provides a high-level API to control headless Chrome or Chromium browsers. It is widely used for web scraping, testing, and automation, and is an essential tool for many developers who work with web applications.

Note that I’ll be demonstrating on ArchLinux

Because Puppeteer relies on Node.js, the first thing we do is create a project directory and initiate npm.

mkdir puppeteer-project;

cd puppeteer-project;

npm init -y

npm i puppeteer --save

sudo pacman -S libx11 libxcomposite libxdamage libxext libxi libxtst nss freetype2 harfbuzz

# Puppeteer requires some additional dependencies to be installed

Now we write our script:

1	vim puppeteer.js

The script should be on the lines of this template:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({
  	executablePath: '/usr/bin/chromium',

  	// We can also drop this line and instead, set an environment variable in Bash.

  	// `$ export PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium`

  	headless: true
  });
  const page = await browser.newPage();
  await page.goto('https://www.example.com');
  await page.screenshot({ path: 'example.png' });
  await browser.close();
})();

This code will launch Chromium in headless mode and navigate to https://www.example.com, take a screenshot of the page, and then close the browser.

More content about Puppeteer coming up!