Node.js Web Scraper for Dynamic Websites
Q: Create a web scraper using Node.js that retrieves and parses data from a website, including features such as dynamic page loading, data validation, and error handling.
- NodeJS
- Mid level question
Explore all the latest NodeJS interview questions and answers
ExploreMost Recent & up-to date
100% Actual interview focused
Create NodeJS interview for FREE!
Sure! Here's an example of how to build a web scraper using Node.js and the Cheerio library. We'll be scraping data from a simple HTML page that lists popular movies.
First, let's create a new Node.js project and install the required dependencies:
mkdir scraper cd scraper npm init -y npm install axios cheerio
Now, let's create a scraper.js file and start coding:
const axios = require('axios'); const cheerio = require('cheerio'); const url = 'https://www.imdb.com/chart/moviemeter'; axios.get(url) .then(response => { const html = response.data; const $ = cheerio.load(html); const movies = []; $('tbody.lister-list tr').each((i, el) => { const title = $(el).find('.titleColumn a').text().trim(); const rating = $(el).find('.imdbRating strong').text().trim(); const poster = $(el).find('.posterColumn img').attr('src'); movies.push({ title, rating, poster }); }); console.log(movies); }) .catch(error => console.log(error));
Here's what this code does:
1. We start by defining the URL of the page we want to scrape.
2. We use the Axios library to send an HTTP GET request to the URL and get the HTML response.
3. We use the Cheerio library to load the HTML and create a jQuery-like object that we can use to select and manipulate elements.
4. We use Cheerio to select all the tr elements in the table body that lists the movies.
5. For each movie tr element, we extract the title, rating, and poster URL by selecting the appropriate elements and attributes using Cheerio.
6. We add the movie information to an array of movies.
7. Finally, we log the array of movies to the console.
To run the scraper, simply run node scraper.js in the terminal. The output should be an array of objects, each containing the title, rating, and poster URL of a popular movie.


