I needed to scrape property listings from a real estate site that heavily relied on JavaScript to load content. The page source was empty until React rendered everything, so traditional tools like requests + BeautifulSoup wouldn’t work. Selenium was the answer.

What I was scraping Link to heading

Property data from a site that dynamically loaded listings as you scrolled. I needed to:

  1. Load the page
  2. Scroll to trigger lazy loading
  3. Wait for listings to appear
  4. Extract the data

The site had no public API, and while I could have reverse-engineered their internal API calls, using Selenium meant I didn’t have to worry about authentication or rate limiting logic.

Selenium vs Playwright vs Puppeteer Link to heading

Selenium - The oldest and most mature option. Works with multiple browsers. Can be slow, but it’s stable and well-documented.

Playwright - Microsoft’s newer tool. Faster than Selenium and has better async support. More modern API. If I were starting fresh today, I’d probably use this.

Puppeteer - Google’s tool, Chrome-focused. Great if you only need Chrome/Chromium. Popular but now arguably superseded by Playwright.

I went with Selenium because:

  • Huge community means better Stack Overflow coverage
  • Works well with Chrome DevTools for debugging
  • I already knew it from previous projects

Being a good citizen Link to heading

Before scraping any site:

  1. Check robots.txt - Visit https://example.com/robots.txt to see what the site allows. Respect the User-agent: * and Disallow: directives.

  2. Rate limit yourself - Add delays between requests. Using time.sleep(2) between page loads is a good starting point.

  3. Use a real User-Agent - Don’t pretend to be a regular browser if you’re a bot, but do identify yourself properly.

  4. Check for an API - Many sites have official APIs. Use those instead if they exist.

  5. Cache aggressively - Don’t request the same page twice. Save to disk and reuse.

The property site I scraped had a robots.txt that allowed /listings, so I was in the clear. I also rate-limited to one request every 3 seconds and ran scraping overnight to avoid peak hours.

Installation Link to heading

pip install selenium webdriver-manager

Basic setup Link to heading

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(
    service=Service(ChromeDriverManager().install())
)

Open a page Link to heading

driver.get("https://example.com")

Wait for element to load Link to heading

This is crucial - never assume elements are ready immediately:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

element = WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.ID, "myid"))
)

Find elements Link to heading

driver.find_element(By.ID, "myid")
driver.find_element(By.CLASS_NAME, "myclass")
driver.find_element(By.CSS_SELECTOR, "div.myclass")
driver.find_elements(By.TAG_NAME, "a")  # returns list

Get text Link to heading

element.text

Click Link to heading

element.click()

Always close when done Link to heading

driver.quit()

For simple HTML parsing after JS loads Link to heading

Pass the rendered page source to BeautifulSoup:

from bs4 import BeautifulSoup
soup = BeautifulSoup(driver.page_source, "html.parser")

This is often easier than using Selenium’s selectors if you need to do complex HTML traversal.

Further reading Link to heading