The Web Scraping Club | Pierluigi Vinciguerra | Substack

Excerpt

News, solutions and interviews about web scraping. In this substack you will find weekly content about:

  • Web Scraping techniques
  • Interviews with key people in the industry
  • Anti bot infos and counter measures
  • Real world examples and code. Click to read The Web Scraping Club, by Pierluigi Vinciguerra, a Substack publication with thousands of subscribers.

The Lab #52: Scraping with LLMs and ScrapeGraphAi - part 1

Are LLMs the Holy Graal for web scraping?

May 30 • 

Pierluigi Vinciguerra

1

The Lab #51: APIs with Bearer Token

Scraping data from API endpoints requiring Bearer Token

May 17 • 

Pierluigi Vinciguerra

1

Celebrating the 50th article of The Lab series

A brief review of the first 50 episodes of The Lab series

May 10 • 

Pierluigi Vinciguerra

3

The Lab #49: Bypassing Cloudflare with open source repositories

And my two cents about these solutions

May 3 • 

Pierluigi Vinciguerra

1

The Lab #48: Scraping with AWS Lambda

Using Serverless and Selenium on Lambda for gathering data

Apr 12 • 

Pierluigi Vinciguerra

2

The Lab #47: Scraping real time data with Python

Using WebSocket to scrape data from Bitstamp and Sofascore

Apr 4 • 

Pierluigi Vinciguerra

3

The Lab #46: Fingerprint injection in Playwright

A home-made solution to bypass anti-bots by changing your browser fingerprint.

Mar 29 • 

Pierluigi Vinciguerra

2

THE LAB #45: Bypassing Geo-fencing While Scraping

How to scrape websites that are banned in your country

Mar 22 • 

Pierluigi Vinciguerra

2

The Lab #44: Scraping the dark web

Scraping the dark web with Playwright and Brave

Mar 7 • 

Pierluigi Vinciguerra

2

The Lab #43: Scraping inventory data: why, how and where

How to use web scraping to get inventory data and estimate sales

Feb 29 • 

Pierluigi Vinciguerra

2

[

3

](https://substack.thewebscraping.club/p/scraping-inventory-data/comments)

The Lab #42: Bypassing PerimeterX without a browser automation tool

Bypassing PerimeterX with free tools and without running a browser

Feb 23 • 

Pierluigi Vinciguerra

The Lab #41: Scrapoxy, the super proxy aggregator

How to use Scrapoxy in your web scraping architecture

Feb 15 • 

Pierluigi Vinciguerra

3

The Lab #40: start a web data monetization project with Data Boutique

Buying and selling high-quality web data has never been so easy.

Feb 8 • 

Pierluigi Vinciguerra

4

The Lab #39: Mouse movements in Playwright

How to move the mouse in Playwright to mimic human behavior

Feb 1 • 

Pierluigi Vinciguerra

3

[

2

](https://substack.thewebscraping.club/p/bypass-datadome-mouse-movements-in-playwright/comments)

The Lab #38: Bypassing Kasada for web scraping 2024 edition

Another articles with tools and techniques to bypass an anti-bot

Jan 25 • 

Pierluigi Vinciguerra

[

2

](https://substack.thewebscraping.club/p/bypassing-kasada-web-scraping/comments)

The Lab #37: Bypassing Cloudflare with anti-detect browsers - Part 2

Using Kameleo to bypass Cloudflare bot detection

Jan 18 • 

Pierluigi Vinciguerra

5

[

1

](https://substack.thewebscraping.club/p/bypassing-cloudflare-with-kameleo/comments)

The Lab #36: Bypassing Cloudflare with anti-detect browsers

Configuring GoLogin to bypass Cloudflare bot detection

Jan 11 • 

Pierluigi Vinciguerra

2

What to expect from The Lab posts in 2024

Why I’m writing the “The Lab” articles and what to expect this new year

Jan 4 • 

Pierluigi Vinciguerra

1

The Lab #35: Bypassing PerimeterX with Python and Playwright

Bypassing Perimeterx with free Python tools in 2023.

Dec 21, 2023 • 

Pierluigi Vinciguerra

The Lab #34: Bypassing Datadome - End of 2023 Version

Is it possible to bypass Datadome today?

Dec 6, 2023 • 

Pierluigi Vinciguerra

2

[

3

](https://substack.thewebscraping.club/p/bypassing-datadome-2023-scraping/comments)

THE LAB 33: Fingerprinting at different connection layers

How to create and test a scraper with a coherent fingerprint between the different layers

Nov 30, 2023 • 

Pierluigi Vinciguerra

THE LAB 32: hRequests vs anti-bots: a full benchmark

How does it perform against Cloudflare, Akamai, Datadome, PerimeterX and Kasada?

Nov 24, 2023 • 

Pierluigi Vinciguerra

4

THE LAB #31: Scraping location data using a world grid

Building a fundamental tool for scraping location data in a cost-effective way

Nov 9, 2023 • 

Pierluigi Vinciguerra

THE LAB #30: How to bypass Akamai protected website when nothing else works

And without paying any commercial solution. An ode to trivial solutions.

Oct 27, 2023 • 

Pierluigi Vinciguerra

1

[

1

](https://substack.thewebscraping.club/p/the-lab-30-how-to-bypass-akamai-protected/comments)

THE LAB #29: Bypass Cloudflare Bot Protection with Scrapy

Is it possible to bypass Cloudflare without using an headful browser?

Oct 13, 2023 • 

Pierluigi Vinciguerra

2

THE LAB #28: Deep dive on inventory levels tracking

A real world example of scraping inventory level from an heavily Akamai-protected website

Sep 28, 2023 • 

Pierluigi Vinciguerra

THE LAB #27: Inventory levels, the holy grail of web scraped data

How web scraping could give advantage in estimating the revenues of companies

Sep 14, 2023 • 

Pierluigi Vinciguerra

1

THE LAB #26: From internal API to insights.

Getting insights on the automotive industry by scraping a car resell website.

Sep 1, 2023 • 

Pierluigi Vinciguerra

3

THE LAB #25: Bypassing Perimeterx in 2023

How to bypass PerimeterX anti-bot solution using both free and commercial solutions

Aug 17, 2023 • 

Pierluigi Vinciguerra

4

THE LAB #24 - Bypassing Akamai using Proxidize

Scraping H&M website to collect e-commerce data, in a reliable way.

Aug 3, 2023 • 

Pierluigi Vinciguerra

2

[

2

](https://substack.thewebscraping.club/p/bypassing-akamai-proxidize/comments)

Buy cheaper plane tickets using a VPN: truth or myth?

Debunking the myth of different ticket prices from different countries

Jul 20, 2023 • 

Pierluigi Vinciguerra

[

2

](https://substack.thewebscraping.club/p/cheaper-plane-tickets-vpn/comments)

THE LAB #22 - Scraping Akamai protected websites

How Zalando and Rakuten use Akamai to protect its website and how to bypass this solution

Jul 6, 2023 • 

Pierluigi Vinciguerra

THE LAB #21 - Bypass anti-bot challenges with AI

How Nimble Browser performs against the most famous anti-bot solutions

Jun 22, 2023 • 

Pierluigi Vinciguerra

1

THE LAB #20 - AI powered web scrapers with Nimble Browser

How artificial intelligence makes web scraping easier

Jun 8, 2023 • 

Pierluigi Vinciguerra

THE LAB #19: How to mask your device fingerprint

Beating the fingerprinting by Cloudflare is possible

May 26, 2023 • 

Pierluigi Vinciguerra

THE LAB #18: How to scrape Reddit with Scrapy

Scraping subreddits without any commercial product, in two easy different ways.

May 11, 2023 • 

Pierluigi Vinciguerra

1

THE LAB #17: Creating a dataset for investors - Tesla (TSLA)

The creation process for a dataset for stock market analysts: Tesla

Apr 28, 2023 • 

Pierluigi Vinciguerra

3

[

2

](https://substack.thewebscraping.club/p/dataset-for-investors-tesla-tsla/comments)

THE LAB #16: How to scrape Datadome protected websites (early 2023 version)

Tools and techniques to scrape Datadome protected websites

Apr 14, 2023 • 

Pierluigi Vinciguerra

1

THE LAB #15: Deep diving into Apify world

Let’s use the new python SDK to know better the Apify ecosystem

Mar 30, 2023 • 

Pierluigi Vinciguerra

1

THE LAB #14: Scraping Cloudflare Protected Websites (early 2023 version)

How to scrape Cloudflare protected websites in 2023

Mar 16, 2023 • 

Pierluigi Vinciguerra

[

2

](https://substack.thewebscraping.club/p/scraping-cloudflare-websites-2023-q1-update/comments)

THE LAB #13: Managing a fleet of scrapers with Scrapeops

Using Scrapeops dashboard to monitor your web scraping operations in large web scraping projects

Mar 2, 2023 • 

Pierluigi Vinciguerra

1

THE LAB #12: Reverse-engineering Mobile API

A step by step guide with Charles Proxy and Android Emulator

Feb 16, 2023 • 

Fabien Vauchelles

4

THE LAB #11: The Anti-Detect Anti-Bot matrix

An incomplete but still yes useful list of interesting resources on web scraping. Testing the most well-known web scraping tools in Python against…

Feb 2, 2023 • 

Pierluigi Vinciguerra

1

THE LAB #10: Bypass Cloudflare Bot Protection with GoLogin

A new way to scrape Cloudflare-protected website using antidetect browsers

Jan 19, 2023 • 

Pierluigi Vinciguerra

3

[

3

](https://substack.thewebscraping.club/p/bypass-cloudflare-scraping-playwright/comments)

THE LAB #9: Scraping OpenSea NFT’s data

Getting winners and losers of the Bored Ape Yacht Club collection transactions

Jan 5, 2023 • 

Pierluigi Vinciguerra

THE LAB #8: Using Bezier curves for human-like mouse movements

What are Bezier curves and why are important in web scraping?

Dec 9, 2022 • 

Pierluigi Vinciguerra

1

[

1

](https://substack.thewebscraping.club/p/bezier-curves-web-scraping/comments)

THE LAB #7: Scraping PerimeterX protected websites

Is scraping Perimeterx website so difficult as it seems?

Nov 24, 2022 • 

Pierluigi Vinciguerra

1

THE LAB #6: Changing Ciphers in Scrapy to avoid bans by TLS Fingerprinting

In other words: fake it until you scrape it

Nov 9, 2022 • 

Pierluigi Vinciguerra

The Lab #5 - Scraping Airbnb.com using GraphQL

What is GraphQL and why it is used so widely

Oct 27, 2022 • 

Pierluigi Vinciguerra

THE LAB #4: Scrapyd - how to manage and schedule a fleet of scrapers

Pro and cons of actual scheduling solutions for Scrapy

Oct 12, 2022 • 

Pierluigi Vinciguerra

THE LAB #3: Scraping Cloudflare protected websites

Without buying any external software, for real.

Sep 27, 2022 • 

Pierluigi Vinciguerra

8

THE LAB #2: scraping data from a website with Datadome and xsrf tokens

A real world use case of a simple scraper that does not get blocked by Datadome

Sep 16, 2022 • 

Pierluigi Vinciguerra

7

THE LAB #1: Scraping data from an app

How to inspect the network traffic of an app with Fiddler Everywhere and scrape the data from its servers.

Sep 5, 2022 • 

Pierluigi Vinciguerra

8

[

4

](https://substack.thewebscraping.club/p/the-lab-1-scraping-data-from-an-app/comments)