sitecrawl 1.0.5

Creator: bigcodingguy24

Last updated:

Add to Cart

Description:

sitecrawl 1.0.5

Simple Python module to crawl a website and extract URLs.

Installation
Using pip:
pip3 install sitecrawl

sitecrawl --help
Or build from sources:
# Clone project
git clone https://github.com/gabfl/sitecrawl && cd sitecrawl

# Installation
pip3 install .


Usage

CLI
sitecrawl --url https://www.yahoo.com/ --depth 2 --max 4 --verbose
->
* Found 4 internal URLs
https://www.yahoo.com
https://www.yahoo.com/entertainment
https://www.yahoo.com/lifestyle
https://www.yahoo.com/plus

* Found 5 external URLs
https://mail.yahoo.com/
https://news.yahoo.com/
https://finance.yahoo.com/
https://sports.yahoo.com/
https://shopping.yahoo.com/

* Skipped 0 URLs


As a module
Basic example:
from sitecrawl import crawl

crawl.base_url = 'https://www.yahoo.com'
crawl.deep_crawl(depth=2)

print('Internal URLs:', crawl.get_internal_urls())
print('External URLs:', crawl.get_external_urls())
print('Skipped URLs:', crawl.get_skipped_urls())
A more detailed example is available in
example.py.

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.