youcab 0.1.3

Creator: bradpython12

Last updated:

Add to Cart

Description:

youcab 0.1.3

YouCab: Converts MeCab Parsing Results to Python Objects





Installation
Install MeCab
MeCab is required for YouCab to work.
If it is not already installed, install MeCab first.
Install YouCab
$ pip install youcab

Tokenize Japanese sentence
In this example code, we generate a tokenizer with MeCab's default dictionary and run tokenization.
The tokenizer converts text into a list of Word objects.
from youcab import youcab

tokenize = youcab.generate_tokenizer()
words = tokenize("本を読んだ")
for word in words:
print("surface: " + word.surface)
print("pos : " + str(word.pos))
print("base : " + word.base)
print("c_type : " + word.c_type)
print("c_form : " + word.c_form)
print("")

surface: 本
pos : ['名詞', '一般']
base : 本
c_type :
c_form :

surface: を
pos : ['助詞', '格助詞', '一般']
base : を
c_type :
c_form :

surface: 読ん
pos : ['動詞', '自立']
base : 読む
c_type : 五段・マ行
c_form : 連用タ接続

surface: だ
pos : ['助動詞']
base : だ
c_type : 特殊・タ
c_form : 基本形

Available for any MeCab dictionary
Dictionaries such as IPAdic, UniDic and neologd are available.
from youcab import youcab

tokenize = youcab.generate_tokenizer(dicdir="/path/to/mecab/dic/dir/")

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Files:

Customer Reviews

There are no reviews.