whoosh-igo 0.7

Creator: bradpython12

Last updated:

Add to Cart

Description:

whooshigo 0.7

About
Tokenizers for Whoosh full text search library designed for Japanese language.
This package conteins two Tokenizers.

IgoTokenizer



requires igo-python(http://pypi.python.org/pypi/igo-python/) and its dictionary.



TinySegmenterTokenizer



requires TinySegmenter in Python(https://code.google.com/p/mhagiwara/source/browse/trunk/nltk/jpbook/tinysegmenter.py)



MeCabTokenizer



requires MeCab python binding(http://mecab.sourceforge.net/bindings.html)




How To Use
IgoTokenizer:
import igo.Tagger
import whooshjp
from whooshjp.IgoTokenizer import IgoTokenizer

tk = IgoTokenizer(igo.Tagger.Tagger('ipadic'))
scm = Schema(title=TEXT(stored=True, analyzer=tk), path=ID(unique=True,stored=True), content=TEXT(analyzer=tk))
TinySegmenterTokenizer:
import tinysegmenter
import whooshjp
from whooshjp.TinySegmenterTokenizer import TinySegmenterTokenizer

tk = TinySegmenterTokenizer(tinysegmenter.TinySegmenter())
scm = Schema(title=TEXT(stored=True, analyzer=tk), path=ID(unique=True,stored=True), content=TEXT(analyzer=tk))


Changelog for Japanese Tokenizers for Whoosh

2011-02-19 – 0.1

first release.


2011-02-21 – 0.2

add TinySegmenterTokenizer
change module name


2011-02-24 – 0.3

add FeatureFilter


2011-02-27 – 0.4

add MeCabTokenizer
add a mode for don’t pickle igo tagger to minimize index.


2011-04-17 – 0.5

correct char offsets


2011-04-17 – 0.6

correct char offsets(TinySegmenterTokenizer)


2012-04-14 – 0.7

rename package(WhooshJapaneseTokenizer to whooshjp)
no longer import sub modules automatically
Python3 compatibility(3.2, 3.3)
Drop Python2.5 support

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Files:

Customer Reviews

There are no reviews.