Scrapy Proxy Middleware: Complete Configuration Guide (2026)

Published June 4, 2026 · 11 min read

Scrapy is still the workhorse for production crawls in 2026 — and still the framework where proxy setup confuses people most, because there are four different places to plug a proxy in and three of them are wrong for most projects. This guide gives you the right one: a small custom middleware with per-request routing, sticky sessions, ban detection, and sane retry behavior.

If you're on plain requests/httpx/aiohttp instead, see How to Rotate Proxies in Python. This article is Scrapy-specific.

The 30-Second Version

For a rotating residential gateway, the minimum viable setup is one line per request — no middleware needed:

def start_requests(self):
    for url in self.urls:
        yield scrapy.Request(
            url,
            meta={"proxy": "http://USERNAME:[email protected]:913"},
        )

Scrapy's built-in HttpProxyMiddleware reads request.meta["proxy"] and handles authentication from the URL. The gateway rotates the exit IP for you. If that's all you need, stop here. The rest of this guide is for when you need control: sticky sessions, country routing, ban-aware rotation, and concurrency tuning.

A Production Proxy Middleware

Drop this in middlewares.py. It assigns sticky sessions per domain, rotates on bans, and tags every request so you can debug which session fetched what:

import random
import string

GATEWAY = "us.jibaoproxy.com:913"
USERNAME = "USERNAME"          # move to settings.py / env in real projects
PASSWORD = "PASSWORD"

def _new_session(n=8):
    return "".join(random.choices(string.ascii_lowercase + string.digits, k=n))

class JibaoProxyMiddleware:
    """Sticky session per domain; rotate session on ban."""

    def __init__(self):
        self.sessions = {}          # domain -> session id

    def _proxy_url(self, session_id):
        user = f"{USERNAME}-session-{session_id}"
        return f"http://{user}:{PASSWORD}@{GATEWAY}"

    def process_request(self, request, spider):
        domain = request.url.split("/")[2]
        session = self.sessions.setdefault(domain, _new_session())
        request.meta["proxy"] = self._proxy_url(session)
        request.meta["proxy_session"] = session

    def rotate(self, domain):
        """Call when a session is burned."""
        self.sessions[domain] = _new_session()

And a companion downloader middleware that detects bans and retries on a fresh session:

from scrapy.downloadermiddlewares.retry import RetryMiddleware
from scrapy.utils.response import response_status_message

BAN_CODES = {403, 429}
BAN_MARKERS = (b"captcha", b"access denied", b"unusual traffic")

class BanAwareRetryMiddleware(RetryMiddleware):

    def process_response(self, request, response, spider):
        banned = (
            response.status in BAN_CODES
            or any(m in response.body[:2048].lower() for m in BAN_MARKERS)
        )
        if banned:
            domain = request.url.split("/")[2]
            proxy_mw = spider.crawler.engine.downloader.middleware.middlewares
            for mw in proxy_mw:
                if hasattr(mw, "rotate"):
                    mw.rotate(domain)        # burn the session
            reason = response_status_message(response.status)
            return self._retry(request, reason, spider) or response
        return super().process_response(request, response, spider)

Wire both up in settings.py:

DOWNLOADER_MIDDLEWARES = {
    "myproject.middlewares.JibaoProxyMiddleware": 350,
    "scrapy.downloadermiddlewares.retry.RetryMiddleware": None,   # replace stock retry
    "myproject.middlewares.BanAwareRetryMiddleware": 550,
}
RETRY_TIMES = 2

Priority matters: the proxy middleware must run before Scrapy's HttpProxyMiddleware (750), so anything under 750 works; 350 keeps it early and predictable.

Sticky vs Rotating: Which Mode for Which Spider

Crawl typeModeImplementation
Stateless page harvestingRotatingBare username, gateway rotates per request
Login + crawl behind authSticky per account-session-{account_id}, never rotate mid-login
Pagination-heavy listingsSticky per domain, rotate on banThe middleware above
Geo-specific pricingRotating + country pinUSERNAME-country-de style parameters

Deeper treatment of this trade-off: Sticky vs Rotating Proxy Sessions.

Concurrency Settings That Don't Get You Banned

Scrapy defaults are tuned for polite single-IP crawling. Behind a rotating pool you can push much harder — but per-domain limits still matter because the target sees aggregate behavior:

# settings.py - sane starting point behind a residential pool
CONCURRENT_REQUESTS = 64
CONCURRENT_REQUESTS_PER_DOMAIN = 8     # what the target experiences
DOWNLOAD_DELAY = 0.25                  # jitter applied per slot
RANDOMIZE_DOWNLOAD_DELAY = True        # 0.5x-1.5x the delay
AUTOTHROTTLE_ENABLED = True
AUTOTHROTTLE_TARGET_CONCURRENCY = 6.0
DOWNLOAD_TIMEOUT = 30
ROBOTSTXT_OBEY = True

Bump CONCURRENT_REQUESTS_PER_DOMAIN only after watching your 403 rate at the current level for a few thousand requests. Going 8 → 32 because "the proxies rotate anyway" is how people burn through GB on retries.

Bandwidth Discipline (Residential GB Are Money)

Common Failures and What They Actually Mean

407 Proxy Authentication Required

Credentials didn't reach the proxy. Put them in the URL (http://user:pass@host:port) in meta["proxy"] — Scrapy parses and sets Proxy-Authorization for you. Setting the header manually and using URL credentials causes double-auth weirdness; pick one.

TunnelError: Could not open CONNECT tunnel

Almost always a typo'd host/port, or HTTPS target through an endpoint that doesn't allow CONNECT on that port. Verify with curl -x outside Scrapy first.

Spider works for 10 minutes, then everything is 403

Your sticky session outlived its welcome, or your per-domain rate is too hot. The ban-aware middleware above handles the first case; lower CONCURRENT_REQUESTS_PER_DOMAIN for the second. If it's a JA4-checking target, Scrapy's TLS stack itself may be the tell — see JA3/JA4 explained for why no proxy fixes that.

Free tool · no signup

Validate your proxy list before the crawl

Paste endpoints into our Proxy Checker: it tests connectivity, latency, anonymity level and exit-IP type in bulk — catch dead or mislabeled proxies before Scrapy wastes retries on them.

Check my proxies →

Tired of babysitting free lists? One residential gateway replaces all of it — get $5 free credit →

Summary

Point Your Spiders at a Real Pool

Rotating and sticky residential sessions on one gateway. $5 free credit to crawl with.

Start Free Trial
Universal for All IP Products · Massive Nodes Always Available

Join now & enjoy up to 100% deposit bonus.

New users get $5 USDT instantly, plus an extra first-deposit reward — limited-time offer.