The Web Scraping Club | Pierluigi Vinciguerra | Substack

Excerpt

News, solutions and interviews about web scraping. In this substack you will find weekly content about:

Web Scraping techniques
Interviews with key people in the industry
Anti bot infos and counter measures
Real world examples and code. Click to read The Web Scraping Club, by Pierluigi Vinciguerra, a Substack publication with thousands of subscribers.

The Lab #52: Scraping with LLMs and ScrapeGraphAi - part 1

Are LLMs the Holy Graal for web scraping?

May 30 •

Pierluigi Vinciguerra

The Lab #51: APIs with Bearer Token

Scraping data from API endpoints requiring Bearer Token

May 17 •

Pierluigi Vinciguerra

Celebrating the 50th article of The Lab series

A brief review of the first 50 episodes of The Lab series

May 10 •

Pierluigi Vinciguerra

The Lab #49: Bypassing Cloudflare with open source repositories

And my two cents about these solutions

May 3 •

Pierluigi Vinciguerra

The Lab #48: Scraping with AWS Lambda

Using Serverless and Selenium on Lambda for gathering data

Apr 12 •

Pierluigi Vinciguerra

The Lab #47: Scraping real time data with Python

Using WebSocket to scrape data from Bitstamp and Sofascore

Apr 4 •

Pierluigi Vinciguerra

The Lab #46: Fingerprint injection in Playwright

A home-made solution to bypass anti-bots by changing your browser fingerprint.

Mar 29 •

Pierluigi Vinciguerra

THE LAB #45: Bypassing Geo-fencing While Scraping

How to scrape websites that are banned in your country

Mar 22 •

Pierluigi Vinciguerra

The Lab #44: Scraping the dark web

Scraping the dark web with Playwright and Brave

Mar 7 •

Pierluigi Vinciguerra

The Lab #43: Scraping inventory data: why, how and where

How to use web scraping to get inventory data and estimate sales

Feb 29 •

Pierluigi Vinciguerra

[

](https://substack.thewebscraping.club/p/scraping-inventory-data/comments)

The Lab #42: Bypassing PerimeterX without a browser automation tool

Bypassing PerimeterX with free tools and without running a browser

Feb 23 •

Pierluigi Vinciguerra

The Lab #41: Scrapoxy, the super proxy aggregator

How to use Scrapoxy in your web scraping architecture

Feb 15 •

Pierluigi Vinciguerra

The Lab #40: start a web data monetization project with Data Boutique

Buying and selling high-quality web data has never been so easy.

Feb 8 •

Pierluigi Vinciguerra

The Lab #39: Mouse movements in Playwright

How to move the mouse in Playwright to mimic human behavior

Feb 1 •

Pierluigi Vinciguerra

[

](https://substack.thewebscraping.club/p/bypass-datadome-mouse-movements-in-playwright/comments)

The Lab #38: Bypassing Kasada for web scraping 2024 edition

Another articles with tools and techniques to bypass an anti-bot

Jan 25 •

Pierluigi Vinciguerra

[

](https://substack.thewebscraping.club/p/bypassing-kasada-web-scraping/comments)

The Lab #37: Bypassing Cloudflare with anti-detect browsers - Part 2

Using Kameleo to bypass Cloudflare bot detection

Jan 18 •

Pierluigi Vinciguerra

[

](https://substack.thewebscraping.club/p/bypassing-cloudflare-with-kameleo/comments)

The Lab #36: Bypassing Cloudflare with anti-detect browsers

Configuring GoLogin to bypass Cloudflare bot detection

Jan 11 •

Pierluigi Vinciguerra

What to expect from The Lab posts in 2024

Why I’m writing the “The Lab” articles and what to expect this new year

Jan 4 •

Pierluigi Vinciguerra

The Lab #35: Bypassing PerimeterX with Python and Playwright

Bypassing Perimeterx with free Python tools in 2023.

Dec 21, 2023 •

Pierluigi Vinciguerra

The Lab #34: Bypassing Datadome - End of 2023 Version

Is it possible to bypass Datadome today?

Dec 6, 2023 •

Pierluigi Vinciguerra

[

](https://substack.thewebscraping.club/p/bypassing-datadome-2023-scraping/comments)

THE LAB 33: Fingerprinting at different connection layers

How to create and test a scraper with a coherent fingerprint between the different layers

Nov 30, 2023 •

Pierluigi Vinciguerra

THE LAB 32: hRequests vs anti-bots: a full benchmark

How does it perform against Cloudflare, Akamai, Datadome, PerimeterX and Kasada?

Nov 24, 2023 •

Pierluigi Vinciguerra

THE LAB #31: Scraping location data using a world grid

Building a fundamental tool for scraping location data in a cost-effective way

Nov 9, 2023 •

Pierluigi Vinciguerra

THE LAB #30: How to bypass Akamai protected website when nothing else works

And without paying any commercial solution. An ode to trivial solutions.

Oct 27, 2023 •

Pierluigi Vinciguerra

[

](https://substack.thewebscraping.club/p/the-lab-30-how-to-bypass-akamai-protected/comments)

THE LAB #29: Bypass Cloudflare Bot Protection with Scrapy

Is it possible to bypass Cloudflare without using an headful browser?

Oct 13, 2023 •

Pierluigi Vinciguerra

THE LAB #28: Deep dive on inventory levels tracking

A real world example of scraping inventory level from an heavily Akamai-protected website

Sep 28, 2023 •

Pierluigi Vinciguerra

THE LAB #27: Inventory levels, the holy grail of web scraped data

How web scraping could give advantage in estimating the revenues of companies

Sep 14, 2023 •

Pierluigi Vinciguerra

THE LAB #26: From internal API to insights.

Getting insights on the automotive industry by scraping a car resell website.

Sep 1, 2023 •

Pierluigi Vinciguerra

THE LAB #25: Bypassing Perimeterx in 2023

How to bypass PerimeterX anti-bot solution using both free and commercial solutions

Aug 17, 2023 •

Pierluigi Vinciguerra

THE LAB #24 - Bypassing Akamai using Proxidize

Scraping H&M website to collect e-commerce data, in a reliable way.

Aug 3, 2023 •

Pierluigi Vinciguerra

[

](https://substack.thewebscraping.club/p/bypassing-akamai-proxidize/comments)

Buy cheaper plane tickets using a VPN: truth or myth?

Debunking the myth of different ticket prices from different countries

Jul 20, 2023 •

Pierluigi Vinciguerra

[

](https://substack.thewebscraping.club/p/cheaper-plane-tickets-vpn/comments)

THE LAB #22 - Scraping Akamai protected websites

How Zalando and Rakuten use Akamai to protect its website and how to bypass this solution

Jul 6, 2023 •

Pierluigi Vinciguerra

THE LAB #21 - Bypass anti-bot challenges with AI

How Nimble Browser performs against the most famous anti-bot solutions

Jun 22, 2023 •

Pierluigi Vinciguerra

THE LAB #20 - AI powered web scrapers with Nimble Browser

How artificial intelligence makes web scraping easier

Jun 8, 2023 •

Pierluigi Vinciguerra

THE LAB #19: How to mask your device fingerprint

Beating the fingerprinting by Cloudflare is possible

May 26, 2023 •

Pierluigi Vinciguerra

THE LAB #18: How to scrape Reddit with Scrapy

Scraping subreddits without any commercial product, in two easy different ways.

May 11, 2023 •

Pierluigi Vinciguerra

THE LAB #17: Creating a dataset for investors - Tesla (TSLA)

The creation process for a dataset for stock market analysts: Tesla

Apr 28, 2023 •

Pierluigi Vinciguerra

[

](https://substack.thewebscraping.club/p/dataset-for-investors-tesla-tsla/comments)

THE LAB #16: How to scrape Datadome protected websites (early 2023 version)

Tools and techniques to scrape Datadome protected websites

Apr 14, 2023 •

Pierluigi Vinciguerra

THE LAB #15: Deep diving into Apify world

Let’s use the new python SDK to know better the Apify ecosystem

Mar 30, 2023 •

Pierluigi Vinciguerra

THE LAB #14: Scraping Cloudflare Protected Websites (early 2023 version)

How to scrape Cloudflare protected websites in 2023

Mar 16, 2023 •

Pierluigi Vinciguerra

[

](https://substack.thewebscraping.club/p/scraping-cloudflare-websites-2023-q1-update/comments)

THE LAB #13: Managing a fleet of scrapers with Scrapeops

Using Scrapeops dashboard to monitor your web scraping operations in large web scraping projects

Mar 2, 2023 •

Pierluigi Vinciguerra

THE LAB #12: Reverse-engineering Mobile API

A step by step guide with Charles Proxy and Android Emulator

Feb 16, 2023 •

Fabien Vauchelles

THE LAB #11: The Anti-Detect Anti-Bot matrix

An incomplete but still yes useful list of interesting resources on web scraping. Testing the most well-known web scraping tools in Python against…

Feb 2, 2023 •

Pierluigi Vinciguerra

THE LAB #10: Bypass Cloudflare Bot Protection with GoLogin

A new way to scrape Cloudflare-protected website using antidetect browsers

Jan 19, 2023 •

Pierluigi Vinciguerra

[

](https://substack.thewebscraping.club/p/bypass-cloudflare-scraping-playwright/comments)

THE LAB #9: Scraping OpenSea NFT’s data

Getting winners and losers of the Bored Ape Yacht Club collection transactions

Jan 5, 2023 •

Pierluigi Vinciguerra

THE LAB #8: Using Bezier curves for human-like mouse movements

What are Bezier curves and why are important in web scraping?

Dec 9, 2022 •

Pierluigi Vinciguerra

[

](https://substack.thewebscraping.club/p/bezier-curves-web-scraping/comments)

THE LAB #7: Scraping PerimeterX protected websites

Is scraping Perimeterx website so difficult as it seems?

Nov 24, 2022 •

Pierluigi Vinciguerra

THE LAB #6: Changing Ciphers in Scrapy to avoid bans by TLS Fingerprinting

In other words: fake it until you scrape it

Nov 9, 2022 •

Pierluigi Vinciguerra

The Lab #5 - Scraping Airbnb.com using GraphQL

What is GraphQL and why it is used so widely

Oct 27, 2022 •

Pierluigi Vinciguerra

THE LAB #4: Scrapyd - how to manage and schedule a fleet of scrapers

Pro and cons of actual scheduling solutions for Scrapy

Oct 12, 2022 •

Pierluigi Vinciguerra

THE LAB #3: Scraping Cloudflare protected websites

Without buying any external software, for real.

Sep 27, 2022 •

Pierluigi Vinciguerra

THE LAB #2: scraping data from a website with Datadome and xsrf tokens

A real world use case of a simple scraper that does not get blocked by Datadome

Sep 16, 2022 •

Pierluigi Vinciguerra

THE LAB #1: Scraping data from an app

How to inspect the network traffic of an app with Fiddler Everywhere and scrape the data from its servers.

Sep 5, 2022 •

Pierluigi Vinciguerra

[

](https://substack.thewebscraping.club/p/the-lab-1-scraping-data-from-an-app/comments)

vuthanhdatt's Second Brain

Explorer

The Web Scraping Club Pierluigi Vinciguerra Substack

The Web Scraping Club | Pierluigi Vinciguerra | Substack

Excerpt

Graph View

Table of Contents