0 purchases
crawl trulia 0.0.4
Welcome to crawl_trulia Documentation
This is a small project provide url route, html parse tools to crawl www.trulia.com.
Quick Links
GitHub Homepage
PyPI download
Install
Issue submit and feature request
Usage
A real example:
>>> from crawl_trulia.urlencoder import urlencoder
>>> from crawl_trulia.htmlparser import htmlparser
>>> from crawlib.spider import spider # install crawlib first
# use address, city and zipcode
>>> address = "22 Yew Rd"
>>> city = "Baltimore"
>>> zipcode = "21221"
>>> url = urlencoder.by_address_city_and_zipcode(address, city, zipcode)
>>> html = spider.get_html(url)
>>> house_detail_data = htmlparser.get_house_detail(html)
>>> house_detail_data
{
"features": {},
"public_records": {
"AC": "a/c",
"basement_type": "improved basement (finished)",
"bathroom": 2,
"build_year": 1986,
"county": "baltimore county",
"exterior_walls": "siding (alum/vinyl)",
"heating": "heat pump",
"lot_size": 7505,
"lot_size_unit": "sqft",
"partial_bathroom": 1,
"roof": "composition shingle",
"sqft": 998
}
}
# usually combination of address and zipcode is enough
>>> address = "2004 Birch Rd"
>>> zipcode = "21221"
>>> url = urlencoder.by_address_and_zipcode(address, zipcode)
>>> html = spider.get_html(url)
>>> house_detail_data = htmlparser.get_house_detail(html)
Install
crawl_trulia is released on PyPI, so all you need is:
$ pip install crawl_trulia
To upgrade to latest version:
$ pip install --upgrade crawl_trulia
For personal and professional use. You cannot resell or redistribute these repositories in their original state.
There are no reviews.