aioscpy 0.3.12

Creator: bigcodingguy24

Last updated:

0 purchases

aioscpy 0.3.12 Image
aioscpy 0.3.12 Images

Languages

Categories

Add to Cart

Description:

aioscpy 0.3.12

Aioscpy
An asyncio + aiolibs crawler imitate scrapy framework
English | 中文
Overview
Aioscpy framework is base on opensource project Scrapy & scrapy_redis.
Aioscpy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.
Dynamic variable injection is implemented and asynchronous coroutine feature support.
Distributed crawling/scraping.
Requirements

Python 3.8+
Works on Linux, Windows, macOS, BSD

Install
The quick way:
# default
pip install aioscpy

# at latest version
pip install git+https://github.com/ihandmine/aioscpy

# install all dependencies
pip install aioscpy[all]

# install extra packages
pip install aioscpy[aiohttp,httpx]

Usage
create project spider:
aioscpy startproject project_quotes

cd project_quotes
aioscpy genspider quotes


quotes.py:
from aioscpy.spider import Spider


class QuotesSpider(Spider):
name = 'quotes'
custom_settings = {
"SPIDER_IDLE": False
}
start_urls = [
'https://quotes.toscrape.com/tag/humor/',
]

async def parse(self, response):
for quote in response.css('div.quote'):
yield {
'author': quote.xpath('span/small/text()').get(),
'text': quote.css('span.text::text').get(),
}

next_page = response.css('li.next a::attr("href")').get()
if next_page is not None:
yield response.follow(next_page, self.parse)

create single script spider:
aioscpy onespider single_quotes

single_quotes.py:
from aioscpy.spider import Spider
from anti_header import Header
from pprint import pprint, pformat


class SingleQuotesSpider(Spider):
name = 'single_quotes'
custom_settings = {
"SPIDER_IDLE": False
}
start_urls = [
'https://quotes.toscrape.com/',
]

async def process_request(self, request):
request.headers = Header(url=request.url, platform='windows', connection=True).random
return request

async def process_response(self, request, response):
if response.status in [404, 503]:
return request
return response

async def process_exception(self, request, exc):
raise exc

async def parse(self, response):
for quote in response.css('div.quote'):
yield {
'author': quote.xpath('span/small/text()').get(),
'text': quote.css('span.text::text').get(),
}

next_page = response.css('li.next a::attr("href")').get()
if next_page is not None:
yield response.follow(next_page, callback=self.parse)

async def process_item(self, item):
self.logger.info("{item}", **{'item': pformat(item)})


if __name__ == '__main__':
quotes = SingleQuotesSpider()
quotes.start()

run the spider:
aioscpy crawl quotes
aioscpy runspider quotes.py


start.py:
from aioscpy.crawler import call_grace_instance
from aioscpy.utils.tools import get_project_settings

"""start spider method one:
from cegex.baidu import BaiduSpider
from cegex.httpbin import HttpBinSpider

process = CrawlerProcess()
process.crawl(HttpBinSpider)
process.crawl(BaiduSpider)
process.start()
"""


def load_file_to_execute():
process = call_grace_instance("crawler_process", get_project_settings())
process.load_spider(path='./cegex', spider_like='baidu')
process.start()


def load_name_to_execute():
process = call_grace_instance("crawler_process", get_project_settings())
process.crawl('baidu', path="./cegex")
process.start()


if __name__ == '__main__':
load_file_to_execute()

more commands:
aioscpy -h

Ready
please submit your sugguestion to owner by issue
Thanks
aiohttp
scrapy
loguru
httpx

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.