django-web-crawler 0.9

Creator: danarutscher

Last updated:

Add to Cart

Description:

djangowebcrawler 0.9

Crawler is a Django app to help connect to a website and gather as much links as you want.

Quick start

Add “gatherlinks” to your INSTALLED_APPS setting like this:
INSTALLED_APPS = [
...
'gatherlinks',
]

Import the “main” module like this:
from gatherlinks.crawler import main

Initialize the StartPoint class like this:
crawler = main.StartPoint(https://example.com, max_crawl=50, number_of_threads=10)


The StartPoint class can be initialized with three arguments.

homepage (a positional argument of the website to gather it’s link.)
max_crawl (maximum number of links to gather from the website. Default is 50)
number_of_threads (Number of threads to be doing the work simultaneously. Default is 10)




After initialising the class, you can then call the “start” method like this:
crawler.start()

When the crawler must have finished gathering the link, you can access the gathered links like this:
crawler.result


That result attribute is a “set” datatype that holds all the links that the crawler could gather.
You can then loop through the “crawler.result” and do whatever you want with it (write to file or save to database).

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.