bigrams 0.1.2

Last updated:

0 purchases

bigrams 0.1.2 Image
bigrams 0.1.2 Images
Add to Cart

Description:

bigrams 0.1.2

(N)Grams


Simply create (N)grams: N ~ Bi | Tri ...





Welcome to bigrams, a Python project that provides a non-intrusive way to connect tokenized sentences in (N)grams.
This tool is designed to work with tokenized sentences, and it is focused on a single task: providing an efficient way
to merge tokens from a list of tokenized sentences.
It's non-intrusive as it leaves tokenisation, stopwords removal and other text preprocessing out of its flow.

Source Code: https://github.com/proteusiq/bigrams
PyPI: https://pypi.org/project/bigrams/

Installation
pip install -U bigrams

Usage
To use bigrams, import it into your Python script, and use scikit-learn-ish API to transform your tokens.
from bigrams import Grams

# expects tokenised sentences
in_sentences = [["this", "is", "new", "york", "baby", "again!"],
["new", "york", "and", "baby", "again!"],
]
g = Grams(window_size=2, threshold=2)

out_sentences = g.fit_transform(in_stences)
print(out_sentences)
# [["this", "is", "new_york", "baby_again!"],
# ["new_york", "and", "baby_again!"],
# ]

Development

Clone this repository
Requirements:

Poetry
Python 3.7+


Create a virtual environment and install the dependencies

poetry install


Activate the virtual environment

poetry shell

Testing
pytest

Pre-commit
Pre-commit hooks run all the auto-formatters (e.g. black, isort), linters (e.g. mypy, flake8), and other quality
checks to make sure the changeset is in good shape before a commit/push happens.
You can install the hooks with (runs for each commit):
pre-commit install

Or if you want them to run only for each push:
pre-commit install -t pre-push

Or if you want e.g. want to run all checks manually for all files:
pre-commit run --all-files


Contributing are welcome
ToDo:

create a save & load function
compare it with gensim Phrases
write replacer in Rust - PyO3

License:

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.