Serpextract 0.7.3

Description:

serpextract 0.7.3

serpextract provides easy extraction of keywords from search engine results pages (SERPs).
This module is possible in large part to the very hard work of the Matomo team.
Specifically, we make extensive use of their list of search engines.

Installation
Latest release on PyPI:
$ pip install serpextract

Usage

Command Line
Command-line usage, returns the engine name and keyword components separated by a
comma and enclosed in quotes:
$ serpextract "http://www.google.ca/url?sa=t&rct=j&q=ars%20technica"
"Google","ars technica"
You can also print out a list of all the SearchEngineParsers currently available in
your local cache via:
$ serpextract -l

Python
from serpextract import get_parser, extract, is_serp, get_all_query_params

non_serp_url = 'http://arstechnica.com/'
serp_url = ('http://www.google.ca/url?sa=t&rct=j&q=ars%20technica&source=web&cd=1&ved=0CCsQFjAA'
'&url=http%3A%2F%2Farstechnica.com%2F&ei=pf7RUYvhO4LdyAHf9oGAAw&usg=AFQjCNHA7qjcMXh'
'j-UX9EqSy26wZNlL9LQ&bvm=bv.48572450,d.aWc')

get_all_query_params()
# ['key', 'text', 'search_for', 'searchTerm', 'qrs', 'keyword', ...]

is_serp(serp_url)
# True
is_serp(non_serp_url)
# False

get_parser(serp_url)
# SearchEngineParser(engine_name='Google', keyword_extractor=['q'], link_macro='search?q={k}', charsets=['utf-8'])
get_parser(non_serp_url)
# None

extract(serp_url)
# ExtractResult(engine_name='Google', keyword=u'ars technica', parser=SearchEngineParser(...))
extract(non_serp_url)
# None
Naive Detection
The list of search engine parsers that Matomo and therefore serpextract uses is far from
exhaustive. If you want serpextract to attempt to guess if a given referring URL is a SERP,
you can specify use_naive_method=True to serpextract.is_serp or serpextract.extract.
By default, the naive method is disabled.
Naive search engine detection tries to find an instance of r'\.?search\.' in the netloc
of a URL. If found, serpextract will then try to find a keyword in the query portion of
the URL by looking for the following params in order:
_naive_params = ('q', 'query', 'k', 'keyword', 'term',)
If one of these are found, a keyword is extracted and an ExtractResult is constructed as:
ExtractResult(domain, keyword, None) # No parser, but engine name and keyword
# Not a recognized search engine by serpextract
serp_url = 'http://search.piccshare.com/search.php?cat=web&channel=main&hl=en&q=test'

is_serp(serp_url)
# False

extract(serp_url)
# None

is_serp(serp_url, use_naive_method=True)
# True

extract(serp_url, use_naive_method=True)
# ExtractResult(engine_name=u'piccshare', keyword=u'test', parser=None)
Custom Parsers
In the event that you have a custom search engine that you’d like to track which is not currently
supported by Matomo/serpextract, you can create your own instance of
serpextract.SearchEngineParser and either pass it explicitly to either
serpextract.is_serp or serpextract.extract or add it
to the internal list of parsers.
# Create a parser for PiccShare
from serpextract import SearchEngineParser, is_serp, extract

my_parser = SearchEngineParser(u'PiccShare', # Engine name
u'q', # Keyword extractor
u'/search.php?q={k}', # Link macro
u'utf-8') # Charset
serp_url = 'http://search.piccshare.com/search.php?cat=web&channel=main&hl=en&q=test'

is_serp(serp_url)
# False

extract(serp_url)
# None

is_serp(serp_url, parser=my_parser)
# True

extract(serp_url, parser=my_parser)
# ExtractResult(engine_name=u'PiccShare', keyword=u'test', parser=SearchEngineParser(engine_name=u'PiccShare', keyword_extractor=[u'q'], link_macro=u'/search.php?q={k}', charsets=[u'utf-8']))
You can also permanently add a custom parser to the internal list of parsers that
serpextract maintains so that you no longer have to explicitly pass a parser
object to serpextract.is_serp or serpextract.extract.
from serpextract import SearchEngineParser, add_custom_parser, is_serp, extract

my_parser = SearchEngineParser(u'PiccShare', # Engine name
u'q', # Keyword extractor
u'/search.php?q={k}', # Link macro
u'utf-8') # Charset
add_custom_parser(u'search.piccshare.com', my_parser)

serp_url = 'http://search.piccshare.com/search.php?cat=web&channel=main&hl=en&q=test'
is_serp(serp_url)
# True

extract(serp_url)
# ExtractResult(engine_name=u'PiccShare', keyword=u'test', parser=SearchEngineParser(engine_name=u'PiccShare', keyword_extractor=[u'q'], link_macro=u'/search.php?q={k}', charsets=[u'utf-8']))

Tests
There are some basic tests for popular search engines, but more are required:
$ pip install -r requirements.txt
$ py.test

Caching
Internally, this module caches an OrderedDict representation of
Matomo’s list of search engines
which is stored in serpextract/search_engines.json. This isn’t intended to change that often and so this
module ships with a cached version.

Overview

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

You're allowed to use the code bits in the repositories in unlimited projects.
Attribution is not required to use the code bits.

What you can do with it

Use them freely in your personal and professional work.

What you can't do with it

Don't be greedy. Selling or distributing these repositories in their original state is prohibited.

zed

Languages

Categories

Description:

License

Share

Overview

What you can do with it

What you can't do with it

Related Products

Views For YouTube Bot writed on Python

AI-Web-Scraper

quivr

roop

More From This Creator

datadict 1.0.0

cynes 0.1.0

cuhnsw 0.0.8

CryptoParser 0.12.5

crossbarhttprequests 0.1.6

serpextract 0.7.3

Languages

Categories

Description:

License

Share

Customer Reviews

License

Overview

What you can do with it

What you can't do with it

Related Products

Views For YouTube Bot writed on Python

AI-Web-Scraper

quivr

roop

zed

More From This Creator

datadict 1.0.0

cynes 0.1.0

cuhnsw 0.0.8

CryptoParser 0.12.5

crossbarhttprequests 0.1.6