GitLocker: The Coding Marketplace

Description:

curlcffi 0.7.1

curl_cffi

Documentation | 中文 README
Python binding for curl-impersonate
via cffi.
Unlike other pure python http clients like httpx or requests, curl_cffi can
impersonate browsers' TLS/JA3 and HTTP/2 fingerprints. If you are blocked by some
website for no obvious reason, you can give curl_cffi a try.
The fingerprints in 0.6 on Windows are all wrong, you should update to 0.7 if you are on
Windows. Sorry for the inconvenience.
Only Python 3.8 and above are supported. Python 3.7 has reached its end of life.

Scrapfly
is an enterprise-grade solution providing Web Scraping API that aims to simplify the
scraping process by managing everything: real browser rendering, rotating proxies, and
fingerprints (TLS, HTTP, browser) to bypass all major anti-bots. Scrapfly also unlocks the
observability by providing an analytical dashboard and measuring the success rate/block
rate in detail.
Scrapfly is a good solution if you are looking for a cloud-managed solution for curl_cffi.
If you are managing TLS/HTTP fingerprint by yourself with curl_cffi, they also maintain a
curl to python converter.

Features

Supports JA3/TLS and http2 fingerprints impersonation, inlucding recent browsers and custome fingerprints.
Much faster than requests/httpx, on par with aiohttp/pycurl, see benchmarks.
Mimics requests API, no need to learn another one.
Pre-compiled, so you don't have to compile on your machine.
Supports asyncio with proxy rotation on each request.
Supports http 2.0, which requests does not.
Supports websocket.

requests
aiohttp
httpx
pycurl
curl_cffi

http2
❌
❌
✅
✅
✅

sync
✅
❌
✅
✅
✅

async
❌
✅
✅
❌
✅

websocket
❌
✅
❌
❌
✅

fingerprints
❌
❌
❌
❌
✅

speed
🐇
🐇🐇
🐇
🐇🐇
🐇🐇

Install
pip install curl_cffi --upgrade

This should work on Linux, macOS and Windows out of the box.
If it does not work on you platform, you may need to compile and install curl-impersonate
first and set some environment variables like LD_LIBRARY_PATH.
To install beta releases:
pip install curl_cffi --upgrade --pre

To install unstable version from GitHub:
git clone https://github.com/yifeikong/curl_cffi/
cd curl_cffi
make preprocess
pip install .

Usage
curl_cffi comes with a low-level curl API and a high-level requests-like API.
requests-like
from curl_cffi import requests

# Notice the impersonate parameter
r = requests.get("https://tools.scrapfly.io/api/fp/ja3", impersonate="chrome")

print(r.json())
# output: {..., "ja3n_hash": "aa56c057ad164ec4fdcb7a5a283be9fc", ...}
# the js3n fingerprint should be the same as target browser

# To keep using the latest browser version as `curl_cffi` updates,
# simply set impersonate="chrome" without specifying a version.
# Other similar values are: "safari" and "safari_ios"
r = requests.get("https://tools.scrapfly.io/api/fp/ja3", impersonate="chrome")

# To pin a specific version, use version numbers together.
r = requests.get("https://tools.scrapfly.io/api/fp/ja3", impersonate="chrome124")

# To impersonate other than browsers, bring your own ja3/akamai strings
# See examples directory for details.
r = requests.get("https://tls.browserleaks.com/json", ja3=..., akamai=...)

# http/socks proxies are supported
proxies = {"https": "http://localhost:3128"}
r = requests.get("https://tools.scrapfly.io/api/fp/ja3", impersonate="chrome", proxies=proxies)

proxies = {"https": "socks://localhost:3128"}
r = requests.get("https://tools.scrapfly.io/api/fp/ja3", impersonate="chrome", proxies=proxies)

Sessions
s = requests.Session()

# httpbin is a http test website, this endpoint makes the server set cookies
s.get("https://httpbin.org/cookies/set/foo/bar")
print(s.cookies)
# <Cookies[<Cookie foo=bar for httpbin.org />]>

# retrieve cookies again to verify
r = s.get("https://httpbin.org/cookies")
print(r.json())
# {'cookies': {'foo': 'bar'}}

curl_cffi supports the same browser versions as supported by my fork of curl-impersonate:
However, only Chrome-like browsers are supported. Firefox support is tracked in #59.
Browser versions will be added only when their fingerprints change. If you see a version, e.g.
chrome122, were skipped, you can simply impersonate it with your own headers and the previous version.
If you are trying to impersonate a target other than a browser, use ja3=... and akamai=...
to specify your own customized fingerprints. See the docs on impersonatation for details.

chrome99
chrome100
chrome101
chrome104
chrome107
chrome110
chrome116 [1]
chrome119 [1]
chrome120 [1]
chrome123 [3]
chrome124 [3]
chrome99_android
edge99
edge101
safari15_3 [2]
safari15_5 [2]
safari17_0 [1]
safari17_2_ios [1]

Notes:

Added in version 0.6.0.
Fixed in version 0.6.0, previous http2 fingerprints were not correct.
Added in version 0.7.0.

asyncio
from curl_cffi.requests import AsyncSession

async with AsyncSession() as s:
r = await s.get("https://example.com")

More concurrency:
import asyncio
from curl_cffi.requests import AsyncSession

urls = [
"https://google.com/",
"https://facebook.com/",
"https://twitter.com/",
]

async with AsyncSession() as s:
tasks = []
for url in urls:
task = s.get(url)
tasks.append(task)
results = await asyncio.gather(*tasks)

WebSockets
from curl_cffi.requests import Session, WebSocket

def on_message(ws: WebSocket, message):
print(message)

with Session() as s:
ws = s.ws_connect(
"wss://api.gemini.com/v1/marketdata/BTCUSD",
on_message=on_message,
)
ws.run_forever()

For low-level APIs, Scrapy integration and other advanced topics, see the
docs for more details.
Acknowledgement

Originally forked from multippt/python_curl_cffi, which is under the MIT license.
Headers/Cookies files are copied from httpx, which is under the BSD license.
Asyncio support is inspired by Tornado's curl http client.
The WebSocket API is inspired by websocket_client.

[Sponsor] Bypass Cloudflare with API

Yescaptcha is a proxy service that bypasses Cloudflare and uses the API interface to obtain verified cookies (e.g. cf_clearance). Click here to register: https://yescaptcha.com/i/stfnIO
[Sponsor] ScrapeNinja

ScrapeNinja is a web scraping API with two engines: fast, with high performance and TLS
fingerprint; and slower with a real browser under the hood.
ScrapeNinja handles headless browsers, proxies, timeouts, retries, and helps with data
extraction, so you can just get the data in JSON. Rotating proxies are available out of
the box on all subscription plans.
Sponsor

Citation
If you find this project useful, please cite it as below:
@software{Kong2023,
author = {Yifei Kong},
title = {curl_cffi - A Python HTTP client for impersonating browser TLS and HTTP/2 fingerprints},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
url = {https://github.com/yifeikong/curl_cffi},
}