robotsparsetools 1.3.3

Creator: railscoder56

Last updated: September 11, 2024

0 purchases

Free

Donate

Languages

Python

Description:

robotsparsetools 1.3.3

robotsparsetools
robots.txt is important when crawling website
This module will help you parse robots.txt
Install
$ pip install robotsparsetools

Usage
Parse
Please create an Parse instance first
from robotsparsetools import Parse

url = "URL of robots.txt you want to parse"
p = Parse(url) # Create an instance. Returns a Parse class with the useragent as the key

# Get allow list
p.Allow(useragent)

# Get disallow list
p.Disallow(useragent)

# Get value of Crawl-delay(Return value is int or None)
p.delay(useragent)

# Find out if crawls are allowed
p.can_crawl(url, useragent)

If no useragent is specified, the value of '*' will be referenced
Also, since the Parse class inherits from dict, you can also use it like dict
from robotsparsetools import Parse

p = Parse(url)
p["*"]
p.get("*") # Can also use get method

Read
You can parse its contents by passing a text or local path to Read
from robotsparsetools import Read
import requests

url = "URL of robots.txt you want to parse"
r = requests.get(url)
p = Read(r.text)

path = "File path of robots.txt you want to parse"
p = Read(path)

The return value is a Parse instance
Make(✨ new in 1.3)
You can easily generate the contents of robots.txt by using this
from robotsparsetools import Make

base = Make()

base.add_sitemap("https://xxxxxx.com/sitemap.xml")
all = base.add_useragent("*")
all.add_disallow("/hoge")

bot = base.add_useragent("bot")
bot.add_allow(["/example", "/any/*"])
bot.add_disallow(["/test", "/xxx/"])

path = "File path"
base.to_file(path) # Output the result to a file

print(base.make()) # Generation

Below is the result of this code
User-agent: *
Disallow: /hoge

User-agent: bot
Disallow: /test
Disallow: /xxx/
Allow: /example
Allow: /any/*

Sitemap: https://xxxxxx.com/sitemap.xml

Error Classes
Also, there are three error classes
from robotsparsetools import NotURLError, NotFoundError, UserAgentExistsError

Command line
You can use rp command
$ rp URL # If you do not specify any options, output Y if crawl is allowed, N if not allowed
$ rp -a URL # Output the Allow list
$ rp -d URL # Output the Disallow list
$ rp -c URL # Output the Crawl-delay

License
This program's license is MIT

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

There are no reviews.

zed

robotsparsetools 1.3.3

Languages

Categories

Description:

License

Share

Overview

What you can do with it

What you can't do with it

Related Products

Views For YouTube Bot writed on Python

AI-Web-Scraper

quivr

roop

More From This Creator

apiverve-randomquote 1.1.4

apiverve-randomidentitygenerator 1.1.4

apiverve-randomidentity 1.0.11

apiverve-randomfacts 1.1.4

apiverve-mortgagecalculator 1.1.4

robotsparsetools 1.3.3

Languages

Categories

Description:

License

Share

Customer Reviews

License

Overview

What you can do with it

What you can't do with it

Related Products

Views For YouTube Bot writed on Python

AI-Web-Scraper

quivr

roop

zed

More From This Creator

apiverve-randomquote 1.1.4

apiverve-randomidentitygenerator 1.1.4

apiverve-randomidentity 1.0.11

apiverve-randomfacts 1.1.4

apiverve-mortgagecalculator 1.1.4