pyskani 0.1.2

Creator: bradpython12

Last updated:

Add to Cart

Description:

pyskani 0.1.2

πŸβ›“οΈπŸ§¬ Pyskani
PyO3 bindings and Python interface to skani, a method for fast fast genomic identity calculation using sparse chaining.















πŸ—ΊοΈ Overview
skani is a method developed by Jim Shaw
and Yun William Yu for fast and robust
metagenomic sequence comparison through sparse chaining. It improves on
FastANI by being more accurate and much faster, while requiring less memory.
pyskani is a Python module, implemented using the PyO3
framework, that provides bindings to skani. It directly links to the
skani code, which has the following advantages over CLI wrappers:

pre-built wheels: pyskani is distributed on PyPI and features
pre-built wheels for common platforms, including x86-64 and Arm64 UNIX.
single dependency: If your software or your analysis pipeline is
distributed as a Python package, you can add pyskani as a dependency to
your project, and stop worrying about the skani binary being present on
the end-user machine.
sans I/O: Everything happens in memory, in Python objects you control,
making it easier to pass your sequences to skani without having to write
them to a temporary file.

This library is still a work-in-progress, and in an experimental stage,
but it should already pack enough features to be used in a standard pipeline.
πŸ”§ Installing
Pyskani can be installed directly from PyPI,
which hosts some pre-built CPython wheels for x86-64 Unix platforms, as well
as the code required to compile from source with Rust:
$ pip install pyskani


In the event you have to compile the package from source, all the required
Rust libraries are vendored in the source distribution, and a Rust compiler
will be setup automatically if there is none on the host machine.
πŸ’‘ Examples
πŸ“ Creating a database
A database can be created either in memory or using a folder on the machine
filesystem to store the sketches. Independently of the storage, a database
can be used immediately for querying, or saved to a different location.
Here is how to create a database into memory,
using Biopython
to load the record:
database = pyskani.Database()
record = Bio.SeqIO.read("vendor/skani/test_files/e.coli-EC590.fasta", "fasta")
database.sketch("E. coli EC590", bytes(record.seq))

For draft genomes, simply pass more arguments to the sketch method, for
which you can use the splat operator:
database = pyskani.Database()
records = Bio.SeqIO.parse("vendor/skani/test_files/e.coli-o157.fasta", "fasta")
sequences = (bytes(record.seq) for record in records)
database.sketch("E. coli O157", *sequences)

πŸ—’οΈ Loading a database
To load a database, either created from skani or pyskani, you can either
load all sketches into memory, for fast querying:
database = pyskani.Database.load("path/to/sketches")

Or load the files lazily to save memory, at the cost of slower querying:
database = pyskani.Database.open("path/to/sketches")

πŸ”Ž Querying a database
Once a database has been created or loaded, use the Database.query method
to compute ANI for some query genomes:
record = Bio.SeqIO.read("vendor/skani/test_files/e.coli-K12.fasta", "fasta")
hits = database.query("E. coli K12", bytes(record.seq))

πŸ”Ž See Also
Computing ANI for closed genomes? You may also be interested in
pyfastani, a Python package for computing ANI
using the FastANI method
developed by Chirag Jain et al.
πŸ’­ Feedback
⚠️ Issue Tracker
Found a bug ? Have an enhancement request ? Head over to the
GitHub issue tracker if you need
to report or ask something. If you are filing in on a bug, please include as
much information as you can about the issue, and try to recreate the same bug
in a simple, easily reproducible situation.
πŸ—οΈ Contributing
Contributions are more than welcome! See
CONTRIBUTING.md
for more details.
βš–οΈ License
This library is provided under the MIT License.
The skani code was written by Jim Shaw
and is distributed under the terms of the MIT License
as well. See vendor/skani/LICENSE for more information. Source distributions
of pyskani vendors additional sources under their own terms using
the cargo vendor
command.
This project is in no way not affiliated, sponsored, or otherwise endorsed
by the original skani authors.
It was developed by Martin Larralde during his
PhD project at the European Molecular Biology Laboratory
in the Zeller team.

License

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.