Lidtk 0.3.0 | GitLocker.com Product

lidtk 0.3.0

Last updated:

0 purchases

lidtk 0.3.0 Image
lidtk 0.3.0 Images
Add to Cart

Description:

lidtk 0.3.0

lidtk
lidtk - the language identification toolkit - was written in order to
investigate the current state of language performance.
Installation
The recommended way to install clana is:
$ pip install lidtk --user

If you want the latest version:
$ git clone https://github.com/MartinThoma/lidtk.git; cd lidtk
$ pip install -e . --user

I recommend getting the WiLI-2018 dataset.
Usage
$ lidtk --help

Usage: lidtk [OPTIONS] COMMAND [ARGS]...

Options:
--version Show the version and exit.
--help Show this message and exit.

Commands:
analyze-data Utility function for the languages...
analyze-unicode-block Analyze how important a Unicode block is for...
char-distrib Use the character distribution language...
cld2 Use the CLD-2 language classifier.
create-dataset Create sharable dataset from downloaded...
download Download 1000 documents of each language.
google-cloud Use the CLD-2 language classifier.
langdetect Use the langdetect language classifier.
langid Use the langid language classifier.
map Map predictions to something known by WiLI
nn Use a neural network classifier.
textcat Use the CLD-2 language classifier.
tfidf_nn Use the TfidfNNClassifier classifier.


For example:
$ lidtk cld2 predict --text 'This is a test.'
eng

The usual order is:

lidtk download: Please use WiLI-2018 instead of downloading the dataset on your own.
lidtk create-dataset: This step can be skipped if you use WiLI-2018
lidtk analyze-unicode-block --start 0 --end 128
lidtk tfidf_nn train vectorizer --config lidtk/classifiers/config/tfidf_nn.yaml
lidtk tfidf_nn train vectorizer --config lidtk/classifiers/config/tfidf_nn.yaml
lidtk tfidf_nn wili --config lidtk/classifiers/config/tfidf_nn.yaml

Or to use one directly:
$ lidtk cld2 predict --text 'This text is written in some language.'

eng

Development
Check tests with tox.

License:

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.