Instagram and TikTok are the two hardest mainstream targets in scraping — harder than Google, harder than Amazon. Both run aggressive IP scoring, both fingerprint your client at multiple layers, and both are famous for the soft block: instead of a clean 403, you get login walls, empty JSON, "challenge_required", or silently truncated results. Your scraper "works" and your data is garbage.
This guide covers how each platform actually blocks you in 2026, the proxy setup that survives, and the request patterns that keep accounts and sessions alive. (For multi-account management rather than scraping, see our AdsPower/Multilogin and GoLogin/Dolphin Anty guides.)
challenge_required. Instagram's tolerance for shared/abused residential IPs is also low — pool quality matters more here than almost anywhere.X-Bogus/signature parameters generated by obfuscated JavaScript. Pure-HTTP scraping means reverse-engineering signatures that rotate every few weeks — most teams run a real browser (Playwright) instead and intercept the JSON responses.itemList responses that look like "no more content").# One identity = one sticky session, country-pinned
socks5h://USERNAME:[email protected]:913
Rotating-per-request is fine for anonymous public pages at low volume. The moment a session cookie or login is involved, switch to sticky — cookie and IP must move as one identity (why).
from curl_cffi import requests
PROXY = {"https": "socks5h://USERNAME:[email protected]:913"}
# Public profile JSON (no login) - impersonate Chrome's TLS
r = requests.get(
"https://www.instagram.com/api/v1/users/web_profile_info/?username=nasa",
impersonate="chrome",
headers={"X-IG-App-ID": "936619743392459"},
proxies=PROXY,
)
data = r.json()["data"]["user"]
print(data["edge_followed_by"]["count"])
Watch the response, not the status code: a 200 with {"data": null} or a redirect to /accounts/login/ means this IP's budget is spent — rotate the identity, back off, and resume.
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(proxy={
"server": "us.jibaoproxy.com:913",
"username": "USERNAME", "password": "PASSWORD",
})
page = browser.new_page()
items = []
# Let TikTok's own JS sign the requests; we just read the answers
page.on("response", lambda res:
items.extend(res.json().get("itemList", []))
if "/api/post/item_list" in res.url else None)
page.goto("https://www.tiktok.com/@nasa")
for _ in range(5):
page.mouse.wheel(0, 2500)
page.wait_for_timeout(1800) # human-ish scroll cadence
print(len(items), "videos captured")
This sidesteps signature reverse-engineering entirely: the page generates valid X-Bogus itself, and you harvest the JSON. The proxy must be set at browser launch — auth quirks are covered in Playwright proxy authentication.
challenge_required, empty itemList with hasMore: true.| TikTok | ||
|---|---|---|
| Best access path | Web GraphQL/API endpoints, curl_cffi | Playwright + response interception |
| Proxy type | Residential sticky per identity | Residential, geo-matched to market |
| Datacenter IPs | Login wall / challenge | Instant captcha |
| Soft-block signal | login redirect, challenge_required, null data | empty itemList, captcha page |
| Geo sensitivity | Moderate | High — content differs by country |
Country-pinned, sticky sessions, clean ASNs — 500MB free traffic, no card required.
Start Free TrialNew users get 500MB free traffic instantly, plus an extra first-deposit reward — limited-time offer.