0 purchases
wordseg 0.0.5
wordseg
wordseg is a Python package of word segmentation models.
Table of contents:
Installation
Usage
License
Changelog
Contributing
Citation
Installation
wordseg is available through pip:
pip install wordseg
To install wordseg from the GitHub source:
git clone https://github.com/jacksonllee/wordseg.git
cd wordseg
pip install -e ".[dev]"
Usage
wordseg implements a word segmentation model as a Python class.
An instantiated model class object has the following methods
(emulating the scikit-learn-styled API for machine learning):
fit: Train the model with segmented sentences.
predict: Predict the segmented sentences from unsegmented sentences.
The implemented model classes are as follows:
RandomSegmenter:
Segmentation is predicted at random at each potential word
boundary independently for some given probability. No training is required.
LongestStringMatching:
This model constructs predicted words by moving
from left to right along an unsegmented sentence and
finding the longest matching words, constrained by a maximum word length parameter.
Sample code snippet:
from src.wordseg import LongestStringMatching
# Initialize a model.
model = LongestStringMatching(max_word_length=4)
# Train the model.
# `fit` takes an iterable of segmented sentences (a tuple or list of strings).
model.fit(
[
("this", "is", "a", "sentence"),
("that", "is", "not", "a", "sentence"),
]
)
# Make some predictions; `predict` gives a generator, which is materialized by list() here.
list(model.predict(["thatisadog", "thisisnotacat"]))
# [['that', 'is', 'a', 'd', 'o', 'g'], ['this', 'is', 'not', 'a', 'c', 'a', 't']]
# We can't get 'dog' and 'cat' because they aren't in the training data.
License
MIT License. Please see LICENSE.txt.
Changelog
Please see CHANGELOG.md.
Contributing
Please see CONTRIBUTING.md.
Citation
Lee, Jackson L. 2023. wordseg: Word segmentation models in Python. https://doi.org/10.5281/zenodo.4077433
@software{leengrams,
author = {Jackson L. Lee},
title = {wordseg: Word segmentation models in Python},
year = 2023,
doi = {10.5281/zenodo.4077433},
url = {https://doi.org/10.5281/zenodo.4077433}
}
For personal and professional use. You cannot resell or redistribute these repositories in their original state.
There are no reviews.